Re: [Numpy-discussion] Masked Arrays in NumPy 1.x

Travis Oliphant Mon, 23 Apr 2012 15:11:45 -0700

Thank you very much for contributing this description.    It is very helpful to 
see how people use numpy.ma in the wild.


-Travis

On Apr 11, 2012, at 2:57 PM, Paul Hobson wrote:

> Travis et al,
> 
> This isn't a reply to anything specific in your email and I apologize
> if there is a better thread or place to share this information. I've
> been meaning to participate in the discussion for a long time and
> never got around to it. The main thing I'd like to is convey my
> typical use of the numpy.ma module as an environmental engineer
> analyzing censored datasets --contaminant concentrations that are
> either at well understood values (not masked) or some unknown value
> below an upper bound (masked).
> 
> My basic understanding is that this discussion revolved around how to
> treat masked data (ignored vs missing) and how to implement one, both,
> or some middle ground between those two concepts. If I'm off-base,
> just ignore all of the following.
> 
> For my purposes, numpy.ma is implemented in a way very well suited to
> my needs. Here's a gist of a something that was *really* hard for me
> before I discovered numpy.ma and numpy in general. (this is a bit
> much, see below for the highlights)
> https://gist.github.com/2361814
> 
> The main message here is that I include the upper bounds of the
> unknown values (detection limits) in my array and use that to
> statistically estimate their values. I must be able to retrieve the
> masked detection limits throughout this process. Additionally the
> masks as currently implemented allow me sort first the undetected
> values, then the detected values (see __rosRanks in the gist).
> 
> As boots-on-the-ground user of numpy, I'm ecstatic that this tool
> exists. I'm also pretty flexible and don't anticipated any major snags
> in my work if things change dramatically as the masked/missing/ignored
> functionality evolves.
> 
> Thanks to everyone for the hard work and great tools,
> -Paul Hobson
> 
> On Mon, Apr 9, 2012 at 9:52 PM, Travis Oliphant <tra...@continuum.io> wrote:
>> Hey all,
>> 
>> I've been waiting for Mark Wiebe to arrive in Austin where he will spend 
>> several weeks, but I also know that masked arrays will be only one of the 
>> things he and I are hoping to make head-way on while he is in Austin.    
>> Nevertheless, we need to make progress on the masked array discussion and if 
>> we want to finalize the masked array implementation we will need to finish 
>> the design.
>> 
>> I've caught up on most of the discussion including Mark's NEP, Nathaniel's 
>> NEP and other writings and the very-nice mailing list discussion that 
>> included a somewhat detailed discussion on the algebra of IGNORED.   I think 
>> there are some things still to be decided.  However, I think some things are 
>> pretty clear:
>> 
>>        1) Masked arrays are going to be fundamental in NumPy and these 
>> should replace most people's use of numpy.ma.   The numpy.ma code will 
>> remain as a compatibility layer
>> 
>>        2) The reality of #1 and NumPy's general philosophy to date means 
>> that masked arrays in NumPy should support the common use-cases of masked 
>> arrays (including getting and setting of the mask from the Python and 
>> C-layers).  However, the semantic of what the mask implies may change from 
>> what numpy.ma uses to having  a True value meaning selected.
>> 
>>        3) There will be missing-data dtypes in NumPy.   Likely only a 
>> limited sub-set (string, bytes, int64, int32, float32, float64, complex64, 
>> complex32, and object) with an API that allows more to be defined if 
>> desired.   These will most likely use Mark's nice machinery for managing the 
>> calculation structure without requiring new C-level loops to be defined.
>> 
>>        4) I'm still not sure about whether the IGNORED concept is necessary 
>> or not.    I really like the separation that was emphasized between 
>> implementation (masks versus bit-patterns) and operations (propagating 
>> versus non-propagating).   Pauli even created another dimension which I 
>> don't totally grok and therefore can't remember.   Pauli?  Do you still feel 
>> that is a necessary construction?  But, do we need the IGNORED concept to 
>> indicate what amounts to different default key-word arguments to functions 
>> that operate on NumPy arrays containing missing data (however that is 
>> represented)?    My current weak view is that it is not really necessary.   
>> But, I could be convinced otherwise.
>> 
>> I think the good news is that given Mark's hard-work and Nathaniel's 
>> follow-up we are really quite far along.   I would love to get Nathaniel's 
>> opinion about what remains un-done in the current NumPy code-base.   I would 
>> also appreciate knowing (from anyone with an interest) opinions of items 1-4 
>> above and anything else I've left out.
>> 
>> Thanks,
>> 
>> -Travis
>> 
>> 
>> 
>> 
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Masked Arrays in NumPy 1.x

Reply via email to