Hi Tom,
I think a sensible alternative mental model for the MaskedArray class is >> that all it does is forward any operations to the data it holds and >> separately propagate a mask, >> > > I'm generally on-board with that mental picture, and agree that the > use-case described by Ben (different layers of satellite imagery) is > important. Same thing happens in astronomy data, e.g. you have a CCD image > of the sky and there are cosmic rays that contaminate the image. Those are > not garbage data, just pixels that one wants to ignore in some, but not > all, contexts. > > However, it's worth noting that one cannot blindly forward any operations > to the data it holds since the operation may be illegal on that data. The > simplest example is dividing `a / b` where `b` has data values of 0 but > they are masked. That operation should succeed with no exception, and here > the resultant value under the mask is genuinely garbage. > Even in the present implementation, the operation is just forwarded, with numpy errstate set to ignore all errors. And then after the fact some half-hearted remediation is done. > The current MaskedArray seems a bit inconsistent in dealing with invalid > calcuations. Dividing by 0 (if masked) is no problem and returns the > numerator. Taking the log of a masked 0 gives the usual divide by zero > RuntimeWarning and puts a 1.0 under the mask of the output. > > Perhaps the expression should not even be evaluated on elements where the > output mask is True, and all the masked output data values should be set to > a predictable value (e.g. zero for numerical, zero-length string for > string, or maybe a default fill value). That at least provides consistent > and predictable behavior that is simple to explain. Otherwise the story is > that the data under the mask *might* be OK, unless for a particular element > the computation was invalid in which case it is filled with some arbitrary > value. I think that is actually an error-prone behavior that should be > avoided. > I think I agree with Allan here, that after a computation, one generally simply cannot safely assume anything for masked elements. But it is reasonable for subclasses to define what they want to do "post-operation"; e.g., for numerical arrays, it might make generally make sense to do ``` notok = ~np.isfinite(result) mask |= notok ``` and one could then also do ``` result[notok] = fill_value ``` But I think one might want to leave that to the user. All the best, Marten
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion