On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett <matthew.br...@gmail.com> wrote: > In the interest of making the discussion as concrete as possible, here > is my draft of an alternative proposal for NAs and masking, based on > Nathaniel's comments. Writing it, it seemed to me that Nathaniel is > right, that the ideas become much clearer when the NA idea and the > MASK idea are separate. Please do pitch in for things I may have > missed or misunderstood: [...]
Thanks for writing this up! I stuck it up as a gist so we can edit it more easily: https://gist.github.com/1056379/ This is your initial version: https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191 And I made a few changes: https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583 Specifically, I added a rationale section, changed np.MASKED to np.IGNORE (as per comments in this thread), and added a vowel to "propmsk". One thing I wonder about the design is whether having an np.MASKED/np.IGNORE value at all helps or hurts. (Occam tells us never to multiply entities without necessity! And it's a bit of an odd fit to the masking concept, since the whole idea is that masking is a property of the array, not the individual datums.) Currently, I see the following uses for it: -- As a return value when someone tries to scalar-index a masked value -- As a placeholder to specify masked values when creating an array from a list (but not when assigning to an array later) -- As a return value when using propmask=True -- As something to display when printing a masked array Another way of doing things would be: -- Scalar-indexing a masked value returns an error, like trying to index past the end of an array. (Slicing etc. would still return a new masked array.) -- Having some sort of placeholder does seem nice, but I'm not sure how often you need to type out a masked array. And I notice that numpy.ma does support this (like so: ma.array([1, ma.masked, 3])) but the examples in the docs never use it. The replacement idiom would be something like: my_data = np.array([1, 999, 3], masked=True); my_data.visible = (my_data != 999). So maybe just leave out the placeholder value, at least for version 1? -- I don't really see the logic for supporting 'propmask' at all. AFAICT no-one has ever even considered this as a useful feature for numpy.ma, never mind implemented it? -- When printing, the numpy.ma approach of using "--" seems much more readable than me than having "IGNORE" all over my screen. So overall, making these changes would let us simplify the design. But maybe propmask is really critical for some use case, or there's some good reason to want to scalar-index missing values without getting an error? -- Nathaniel _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion