On Sun, Nov 6, 2011 at 4:43 PM, Nathaniel Smith <n...@pobox.com> wrote:
> Hi matplotters,
>
> As any of you subscribed to the numpy-discussion list will have
> probably noticed, there's intense debate going on about how numpy can
> do a better job of handling missing data and masked arrays. Part of
> the problem is that we aren't actually sure what users need these
> features to do. There's one group who just wants R-style "missing
> data", and their needs are pretty straightforward -- they just want a
> magic value that indicates some data point doesn't actually exist. But
> it seems like there's also demand for a more "masked array"-like
> feature, similar to the current numpy.ma, where the mask is
> non-destructive and easily manipulable. No-one seems clear on who
> exactly this should work, though, and there's a lot of disagreement
> about what semantics make sense. (If you want more details, there's a
> wiki page summarizing some of this[1]).
>
> Since you seem to be the biggest users of numpy.ma, it would be really
> helpful if you could explain how you actually use it, so we can make
> sure that whatever we do in numpy-land is actually useful to you!
>
> What does matplotlib use masked arrays for? Is it just a convenient
> way to keep an array and a boolean mask together in one object, or do
> you take advantage of more numpy.ma features? For example, do you
> ever:
> - unmask values?
> - create multiple arrays that share the same storage for their data,
> but have different masks? (i.e., creating a new array with new
> elements masked, but without actually allocating the memory for a full
> array copy)
> - use reduction operations on masked arrays? (e.g., np.sum(masked_arr))
> - use binary operations on masked arrays? (e.g., masked_arr1 +
> masked_arr2)
>
> And while we're at it, any complaints about how numpy.ma works now,
> that a new version might do better?
>
> Thanks in advance,
> -- Nathaniel
>
> [1] https://github.com/njsmith/numpy/wiki/NA-discussion-status
>
>
Hi Nathaniel,
Unfortunately, I can't spend much more time on this topic due to my
dissertation work. I will allow others to elaborate further, if they wish.
But I think I can summarize it a bit.
First, we try our best to respect multiple ways of users specifying missing
data as input to our main plotting functions. Most common are NaNs and
np.mamasks. Given that we try to maintain compatibility with older
versions of
Numpy, we are going to have to build some sort of compatibility mechanism
to unify any representation (NaNs, np.ma, NA(or whatever it will be
called)) under a single abstraction to be used internally. This will
probably be np.ma at first until we can depend on the existence of np.NA.
Second, with functions that have multiple input arrays (pretty much all of
them), a single mask has to be applied to all data (typically a
logical_or'ing of the individual masks). Some other functions such as the
pcolor family of functions have slightly more complicated mask merging.
The most important thing is that we do not modify the user's data, and we
keep copies to a minimum. np.ma works great because we can convert the
arrays into masked_arrays without a copy, and the mask-merging process does
not modify the user's input data. I don't think we were using some of the
more advanced features of np.ma, but I can't be sure of that.
I guess the tricky thing that could occur (and probably should be tested
for) is if the input array is already a masked array and that we aren't
changing the user's pre-existing masks.
Ben Root
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel