Re: [Numpy-discussion] new MaskedArray class

Marten van Kerkwijk Sun, 23 Jun 2019 19:28:35 -0700

Hi Eric,

On your other points:


I remain unconvinced that Mask classes should behave differently on
> different ufuncs. I don’t think np.minimum(ignore_na, b) is any different
> to np.add(ignore_na, b) - either both should produce b, or both should
> produce ignore_na. I would lean towards produxing ignore_na, and
> propagation behavior differing between “ignore” and “invalid” only for
> reduce / accumulate operations, where the concept of skipping an
> application is well-defined.
>
I think I mostly agree - this is really about reductions. And this fact
that there are apparently only two choices weakens the case for pushing the
logic into the mask class itself.

But the one case that still tempts me to break with the strict rule for
ufunc.__call__ is `fmin, fmax` vs `minimum, maximum`... What do you think?


> Some possible follow-up questions that having two distinct masked types
> raise:
>
>    - what if I want my data to support both invalid and skip fields at
>    the same time? sum([invalid, skip, 1]) == invalid
>
> Have a triple-valued mask? Would be easy to implement if all the logic is
in the mask...

(Indeed, for all I care it could implement weighting! That would actually
care about the actual operation, so would be a real example. Though of
course it also does need access to the actual data, so perhaps best not to
go there...)

>
>    - is there a use case for more that these two types of mask?
>    invalid_due_to_reason_A, invalid_due_to_reason_B would be interesting
>    things to track through a calculation, possibly a dictionary of named 
> masks.
>
> For astropy's NDData, there has been quite a bit of discussion of a
`Flags` object, which works exactly as you describe, an OR together of
different reasons for why data is invalid (HST uses this, though the
discussion was for the Large Synodic Survey Telescope data pipeline). Those
flags are propagated like masks.

I think in most cases, these examples would not require more than allowing
the mask to be a duck type. Though perhaps for some reductions, it might
matter what the reduction of the data is actually doing (e.g.,
`np.minimum.reduce` might need different rules than `np.add.reduce`). And,
of course, one can argue that for such a case it might be best to subclass
MaskedArray itself, and do the logic there.

All the best,

Marten

p.s. For accumulations, I'm still not sure I find them well-defined. I
could see that np.add.reduce([0, 1, 1, --, 3])` could lead to `[0, 1, 2,
5]`, i.e., a shorter sequence, but this doesn't work on arrays where
different rows can have different numbers of masked elements. It then
perhaps suggests `[0, 1, 2, --, 5]` is OK, but the annoyance I have is that
there is nothing that tells me what the underlying data should be, i.e.,
this is truly different from having a `where` keyword in `np.add.reduce`.
But perhaps it is that I do not see much use for accumulation beyond
changing a histogram into its cumulative version - for which masked
elements really make no sense - one somehow has to interpolate over, not
just discount the masked one.

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] new MaskedArray class

Reply via email to