On Fri, Jun 24, 2011 at 10:59 PM, Nathaniel Smith <[email protected]> wrote:
> On Fri, Jun 24, 2011 at 6:57 PM, Benjamin Root <[email protected]> wrote: > > On Fri, Jun 24, 2011 at 8:11 PM, Nathaniel Smith <[email protected]> wrote: > >> This is a situation where I would just... use an array and a mask, > >> rather than a masked array. Then lots of things -- changing fill > >> values, temporarily masking/unmasking things, etc. -- come from free, > >> just from knowing how arrays and boolean indexing work? > > > > With a masked array, it is "for free". Why re-invent the wheel? It has > > already been done for me. > > But it's not for free at all. It's an additional concept that has to > be maintained, documented, and learned (with the last cost, which is > multiplied by the number of users, being by far the greatest). It's > not reinventing the wheel, it's saying hey, I have wheels and axles, > but what I really need the library to provide is a wheel+axle > assembly! > It feels like you're suggesting the NA bit pattern vs mask distinction and the programming interface users of NumPy see are closely tied together. This isn't the case at all, and I would like more feedback on the interface side of things irrespective of the implementation details. Please tell me what your wheel+axle assembly looks like. >> Do we really get much advantage by building all these complex > >> operations in? I worry that we're trying to anticipate and write code > >> for every situation that users find themselves in, instead of just > >> giving them some simple, orthogonal tools. > >> > > > > This is the danger, and which is why I advocate retaining the MaskedArray > > type that would provide the high-level "intelligent" operations, > meanwhile > > having in the core the basic data structures for pairing a mask with an > > array, and to recognize a special np.NA value that would act upon the > mask > > rather than the underlying data. Users would get very basic > functionality, > > while the MaskedArray would continue to provide the interface that we are > > used to. > > The interface as described is quite different... in particular, all > aggregate operations would change their behavior. > Which operations are changing, and what is the difference in behavior? I don't recall proposing something like this. My initial proposal had a difference with R for the aggregate operations, but I've changed the NEP based on your feedback. >> As a corollary, I worry that learning and keeping track of how masked > >> arrays work is more hassle than just ignoring them and writing the > >> necessary code by hand as needed. Certainly I can imagine that *if the > >> mask is a property of the data* then it's useful to have tools to keep > >> it aligned with the data through indexing and such. But some of these > >> other things are quicker to reimplement than to look up the docs for, > >> and the reimplementation is easier to read, at least for me... > > > > What you are advocating is similar to the "tried-n-true" coding practice > of > > Matlab users of using NaNs. You will hear from Matlab programmers about > how > > it is the greatest idea since sliced bread (and I was one of them). Then > I > > was introduced to Numpy, and I while I do sometimes still do the NaN > > approach, I realized that the masked array is a "better" way. > > Hey, no need to go around calling people Matlab programmers, you might > hurt someone's feelings. > > But seriously, my argument is that every abstraction and new concept > has a cost, and I'm dubious that the full masked array abstraction > carries its weight and justifies this cost, because it's highly > redundant with existing abstractions. That has nothing to do with how > tried-and-true anything is. > The abstraction is R-like missing values, and two implementation mechanisms are NA bit patterns and masks. There is no "full masked array abstraction" as a component end users will have to learn. -Mark > As for documentation, on hard/soft masks, just look at the docs for the > > MaskedArray constructor: > [...snipped...] > > Thanks! > > -- Nathaniel > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
