Hi,
On Sat, Jun 25, 2011 at 3:44 PM, Wes McKinney <[email protected]> wrote:
...
> Here are some things I can think of that would be affected by any changes here
>
> 1) Right now users of pandas can type pandas.isnull(series[5]) and
> that will yield True if the value is NA for any dtype. This might be
> hard to support in the masked regime
But, following the NEP, I could imagine something like this:
def isnull(a):
if a.validitymask is None:
return np.ones(a.shape, dtype=np.bool)
return a.validitymask == False
I suppose the return array in this case would be 0d bool. Would that
not serve here?
> 2) Functions like {Series, DataFrame}.fillna would hopefully look just
> like this:
>
> # value is 0 or some other value to fill
> new_series = self.copy()
> new_series[isnull(new_series)] = value
isnull above or:
new_series = new_series.fill_masked(value)
?
> Keep in mind that people will write custom NA handling logic. So they might
> do:
>
> series[isnull(other_series) & isnull(other_series2)] = val
> 3) Nulling / NA-ing out data is very common
>
> # null out this data up to and including date1 in these three columns
> frame.ix[:date1, [col1, col2, col3]] = NaN
I think Mark is proposing that this:
frame.ix[:date1, [col1, col2, col3]] = np.NA
will work - maybe he can correct me if I'm wrong?
> I'll try to think of some others. The main thing is that the NA value
> is very easy to think about and fits in naturally with how people (at
> least statistical / financial users) think about and work with data.
> If you have to say "I have to set these mask locations to True" it
> introduces additional mental effort compared with "I'll just set these
> values to NA"
I could imagine making the API such that, in practice, you would be
thinking that you were setting the values to NA, even though you were
in fact setting a mask.
My own worry here is not about the API, but the implementation. I'm
worried that it is using more memory, and I don't know how we can be
sure whether it will be faster without implementing both.
See you,
Matthew
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion