On Fri, Jun 24, 2011 at 11:59 PM, Nathaniel Smith <n...@pobox.com> wrote: > On Fri, Jun 24, 2011 at 6:57 PM, Benjamin Root <ben.r...@ou.edu> wrote: >> On Fri, Jun 24, 2011 at 8:11 PM, Nathaniel Smith <n...@pobox.com> wrote: >>> This is a situation where I would just... use an array and a mask, >>> rather than a masked array. Then lots of things -- changing fill >>> values, temporarily masking/unmasking things, etc. -- come from free, >>> just from knowing how arrays and boolean indexing work? >> >> With a masked array, it is "for free". Why re-invent the wheel? It has >> already been done for me. > > But it's not for free at all. It's an additional concept that has to > be maintained, documented, and learned (with the last cost, which is > multiplied by the number of users, being by far the greatest). It's > not reinventing the wheel, it's saying hey, I have wheels and axles, > but what I really need the library to provide is a wheel+axle > assembly!
You're communicating my argument better than I am. >>> Do we really get much advantage by building all these complex >>> operations in? I worry that we're trying to anticipate and write code >>> for every situation that users find themselves in, instead of just >>> giving them some simple, orthogonal tools. >>> >> >> This is the danger, and which is why I advocate retaining the MaskedArray >> type that would provide the high-level "intelligent" operations, meanwhile >> having in the core the basic data structures for pairing a mask with an >> array, and to recognize a special np.NA value that would act upon the mask >> rather than the underlying data. Users would get very basic functionality, >> while the MaskedArray would continue to provide the interface that we are >> used to. > > The interface as described is quite different... in particular, all > aggregate operations would change their behavior. > >>> As a corollary, I worry that learning and keeping track of how masked >>> arrays work is more hassle than just ignoring them and writing the >>> necessary code by hand as needed. Certainly I can imagine that *if the >>> mask is a property of the data* then it's useful to have tools to keep >>> it aligned with the data through indexing and such. But some of these >>> other things are quicker to reimplement than to look up the docs for, >>> and the reimplementation is easier to read, at least for me... >> >> What you are advocating is similar to the "tried-n-true" coding practice of >> Matlab users of using NaNs. You will hear from Matlab programmers about how >> it is the greatest idea since sliced bread (and I was one of them). Then I >> was introduced to Numpy, and I while I do sometimes still do the NaN >> approach, I realized that the masked array is a "better" way. > > Hey, no need to go around calling people Matlab programmers, you might > hurt someone's feelings. > > But seriously, my argument is that every abstraction and new concept > has a cost, and I'm dubious that the full masked array abstraction > carries its weight and justifies this cost, because it's highly > redundant with existing abstractions. That has nothing to do with how > tried-and-true anything is. +1. I think I will personally only be happy if "masked array" can be implemented while incurring near-zero cost from the end user perspective. If what we end up with is a faster implementation of numpy.ma in C I'm probably going to keep on using NaN... That's why I'm entirely insistent that whatever design be dogfooded on non-expert users. If it's very much harder / trickier / nuanced than R, you will have failed. >> As for documentation, on hard/soft masks, just look at the docs for the >> MaskedArray constructor: > [...snipped...] > > Thanks! > > -- Nathaniel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion