On Mon, Oct 24, 2011 at 10:54 AM, Charles R Harris <[email protected]> wrote: > > > On Mon, Oct 24, 2011 at 8:40 AM, Charles R Harris > <[email protected]> wrote: >> >> >> On Sun, Oct 23, 2011 at 11:23 PM, Wes McKinney <[email protected]> >> wrote: >>> >>> On Sun, Oct 23, 2011 at 8:07 PM, Eric Firing <[email protected]> wrote: >>> > On 10/23/2011 12:34 PM, Nathaniel Smith wrote: >>> > >>> >> like. And in this case I do think we can come up with an API that will >>> >> make everyone happy, but that Mark's current API probably can't be >>> >> incrementally evolved to become that API.) >>> >> >>> > >>> > No one could object to coming up with an API that makes everyone happy, >>> > provided that it actually gets coded up, tested, and is found to be >>> > fast >>> > and maintainable. When you say the API probably can't be evolved, do >>> > you mean that the underlying implementation also has to be redone? And >>> > if so, who will do it, and when? >>> > >>> > Eric >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > [email protected] >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> > >>> >>> I personally am a bit apprehensive as I am worried about the masked >>> array abstraction "leaking" through to users of pandas, something >>> which I simply will not accept (why I decided against using numpy.ma >>> early on, that + performance problems). Basically if having an >>> understanding of masked arrays is a prerequisite for using pandas, the >>> whole thing is DOA to me as it undermines the usability arguments I've >>> been making about switching to Python (from R) for data analysis and >>> statistical computing. >> >> The missing data functionality looks far more like R than numpy.ma. >> > > For instance > > In [8]: a = arange(5, maskna=1) > > In [9]: a[2] = np.NA > > In [10]: a.mean() > Out[10]: NA(dtype='float64') > > In [11]: a.mean(skipna=1) > Out[11]: 2.0 > > In [12]: a = arange(5) > > In [13]: b = a.view(maskna=1) > > In [14]: a.mean() > Out[14]: 2.0 > > In [15]: b[2] = np.NA > > In [16]: b.mean() > Out[16]: NA(dtype='float64') > > In [17]: b.mean(skipna=1) > Out[17]: 2.0 > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >
I don't really agree with you. some sample R code > arr <- rnorm(10) > arr[5:8] <- NA > arr [1] 0.6451460 -1.1285552 0.6869828 0.4018868 NA NA [7] NA NA 0.3322803 -1.9201257 In your examples you had to pass maskna=True-- I suppose that my only recourse would be to make sure that every array inside a DataFrame, for example, has maskna=True set. I'll have to look in more detail and see if it's feasible/desirable. There's a memory cost to pay, but you can't get the functionality for free. I may just end up sticking with NaN as it's worked pretty well so far the last few years-- it's an impure solution but one with reasonably good performance characteristics in the places that matter. _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
