On 2013/09/30 4:57 PM, Ondřej Čertík wrote: > On Mon, Sep 30, 2013 at 8:29 PM, Eric Firing <[email protected]> wrote: >> On 2013/09/30 4:05 PM, [email protected] wrote: >>> On Mon, Sep 30, 2013 at 9:38 PM, Charles R Harris >>> <[email protected]> wrote: >>>> >>>> >>>> >>>> On Mon, Sep 30, 2013 at 7:05 PM, Ondřej Čertík <[email protected]> >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> What is the rationale for using False in 'mask' for elements that >>>>> should be included? >>>>> >>>>> http://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html >>>>> >>>>> As opposed to using True for elements that should be included, which >>>>> is what I was intuitively expecting when I started using the masked >>>>> arrays. This "True convention" also happens to be the one used in >>>>> Fortran, see e.g.: >>>>> >>>>> http://gcc.gnu.org/onlinedocs/gfortran/SUM.html >>>>> >>>>> So it's confusing why NumPy would chose a "False convention". Could it >>>>> be, that NumPy views 'mask' as opacity? Then it would make sense to >>>>> use True to make a value 'opaque'. >>>> >>>> >>>> There was a lengthy discussion of this point back when the NA work was >>>> done. >>>> You might be able to find the thread with a search. >>>> >>>> As to why it is as it is, I suspect it is historical consistency. Pierre >>>> wrote the masked array package for numpy, but it may very well go back to >>>> the masked array package implemented for Numeric. >>> >>> I don't know ancient history, but I thought it's "natural". (Actually, >>> I never thought about it.) >>> >>> I always thought `mask` indicates the "masked" (invalid, hidden) >>> values, and masked arrays mask the missing values. >> >> Exactly. It is also consistent with the C and Unix convention of >> returning 0 on success and 1, or a non-zero error code on failure. In a >> similar vein, it works nicely with bit-mapped quality control flags, >> etc. When nothing is flagged, the value is good, and consequently not >> masked out. > > I see, that makes sense. So to remember this, the rule is: > > "Specify elements that you want to get masked using True in 'mask'". > > But why do I need to invert the mask when I want to see the valid elements: > > In [1]: from numpy import ma > > In [2]: a = ma.array([1, 2, 3, 4], mask=[False, False, True, False]) > > In [3]: a > Out[3]: > masked_array(data = [1 2 -- 4], > mask = [False False True False], > fill_value = 999999) > > > In [4]: a[~a.mask] > Out[4]: > masked_array(data = [1 2 4], > mask = [False False False], > fill_value = 999999) > > > I would find natural to write [4] as a[a.mask]. This is when it gets > confusing.
There is no getting around it; each of the two possible conventions has its advantages. But try this instead: In [2]: a = ma.array([1, 2, 3, 4], mask=[False, False, True, False]) In [3]: a.compressed() Out[3]: array([1, 2, 4]) I do occasionally need a "goodmask" which is the inverse of a.mask, but not very often; and when I do, needing to invert a.mask doesn't bother me. Eric > > For example in Fortran, one does: > > integer :: a(4) = [1, 2, 3, 4] > logical :: mask(4) = [.true., .true., .false., .true.] > print *, a > print *, pack(a, mask) > > and it prints: > > 1 2 3 4 > 1 2 4 > > So the behavior of mask when used as an index to select elements from > an array is identical to NumPy --- True means include the element, > False means exclude it. > > Ondrej > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
