On Mon, Oct 24, 2011 at 11:12 AM, Wes McKinney <[email protected]> wrote:
> On Mon, Oct 24, 2011 at 10:54 AM, Charles R Harris > <[email protected]> wrote: > > > > > > On Mon, Oct 24, 2011 at 8:40 AM, Charles R Harris > > <[email protected]> wrote: > >> > >> > >> On Sun, Oct 23, 2011 at 11:23 PM, Wes McKinney <[email protected]> > >> wrote: > >>> > >>> On Sun, Oct 23, 2011 at 8:07 PM, Eric Firing <[email protected]> > wrote: > >>> > On 10/23/2011 12:34 PM, Nathaniel Smith wrote: > >>> > > >>> >> like. And in this case I do think we can come up with an API that > will > >>> >> make everyone happy, but that Mark's current API probably can't be > >>> >> incrementally evolved to become that API.) > >>> >> > >>> > > >>> > No one could object to coming up with an API that makes everyone > happy, > >>> > provided that it actually gets coded up, tested, and is found to be > >>> > fast > >>> > and maintainable. When you say the API probably can't be evolved, do > >>> > you mean that the underlying implementation also has to be redone? > And > >>> > if so, who will do it, and when? > >>> > > >>> > Eric > >>> > _______________________________________________ > >>> > NumPy-Discussion mailing list > >>> > [email protected] > >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >>> > > >>> > >>> I personally am a bit apprehensive as I am worried about the masked > >>> array abstraction "leaking" through to users of pandas, something > >>> which I simply will not accept (why I decided against using numpy.ma > >>> early on, that + performance problems). Basically if having an > >>> understanding of masked arrays is a prerequisite for using pandas, the > >>> whole thing is DOA to me as it undermines the usability arguments I've > >>> been making about switching to Python (from R) for data analysis and > >>> statistical computing. > >> > >> The missing data functionality looks far more like R than numpy.ma. > >> > > > > For instance > > > > In [8]: a = arange(5, maskna=1) > > > > In [9]: a[2] = np.NA > > > > In [10]: a.mean() > > Out[10]: NA(dtype='float64') > > > > In [11]: a.mean(skipna=1) > > Out[11]: 2.0 > > > > In [12]: a = arange(5) > > > > In [13]: b = a.view(maskna=1) > > > > In [14]: a.mean() > > Out[14]: 2.0 > > > > In [15]: b[2] = np.NA > > > > In [16]: b.mean() > > Out[16]: NA(dtype='float64') > > > > In [17]: b.mean(skipna=1) > > Out[17]: 2.0 > > > > Chuck > > > > _______________________________________________ > > NumPy-Discussion mailing list > > [email protected] > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > I don't really agree with you. > > some sample R code > > > arr <- rnorm(10) > > arr[5:8] <- NA > > arr > [1] 0.6451460 -1.1285552 0.6869828 0.4018868 NA NA > [7] NA NA 0.3322803 -1.9201257 > > In your examples you had to pass maskna=True-- I suppose that my only > recourse would be to make sure that every array inside a DataFrame, > for example, has maskna=True set. I'll have to look in more detail and > see if it's feasible/desirable. There's a memory cost to pay, but you > can't get the functionality for free. I may just end up sticking with > NaN as it's worked pretty well so far the last few years-- it's an > impure solution but one with reasonably good performance > characteristics in the places that matter. > It might useful to have a way of setting global defaults, or something like a with statement. These are the sort of things that can be adjusted based on experience. For instance, I'm thinking skipna=1 is the natural default for the masked arrays. Chuck
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
