On Mon, Jun 24, 2019 at 7:21 PM Stephan Hoyer <sho...@gmail.com> wrote:
> On Mon, Jun 24, 2019 at 3:56 PM Allan Haldane <allanhald...@gmail.com> > wrote: > >> I'm not at all set on that behavior and we can do something else. For >> now, I chose this way since it seemed to best match the "IGNORE" mask >> behavior. >> >> The behavior you described further above where the output row/col would >> be masked corresponds better to "NA" (propagating) mask behavior, which >> I am leaving for later implementation. > > > This does seem like a clean way to *implement* things, but from a user > perspective I'm not sure I would want separate classes for "IGNORE" vs "NA" > masks. > > I tend to think of "IGNORE" vs "NA" as descriptions of particular > operations rather than the data itself. There are a spectrum of ways to > handle missing data, and the right way to propagating missing values is > often highly context dependent. The right way to set this is in functions > where operations are defined, not on classes that may be defined far away > from where the computation happen. For example, pandas has a "min_count" > parameter in functions for intermediate use-cases between "IGNORE" and "NA" > semantics, e.g., "take an average, unless the number of data points is > fewer than min_count." > Anything that specific like that is probably indeed outside of the purview of a MaskedArray class. But your general point is well taken: we really need to ask clearly what the mask means not in terms of operations but conceptually. Personally, I guess like Benjamin I have mostly thought of it as "data here is bad" (because corrupted, etc.) or "data here is irrelevant" (because of sea instead of land in a map). And I would like to proceed nevertheless with calculating things on the remainder. For an expectation value (or, less obviously, a minimum or maximum), this is mostly OK: just ignore the masked elements. But even for something as simple as a sum, what is correct is not obvious: if I ignore the count, I'm effectively assuming the expectation is symmetric around zero (this is why `vector.dot(vector)` fails); a better estimate would be `np.add.reduce(data, where=~mask) * N(total) / N(unmasked)`. Of course, the logical conclusion would be that this is not possible to do without guidance from the user, or, thinking more, that really a masked array is not at all what I want for this problem; really I am just using (1-mask) as a weight, and the sum of the weights matters, so I should have a WeightArray class where that is returned along with the sum of the data (or, a bit less extreme, a `CountArray` class, or, more extreme, a value and its uncertainty - heck, sounds a lot like my Variable class from 4 years ago, https://github.com/astropy/astropy/pull/3715, which even takes care of covariance [following the Uncertainty package]). OK, it seems I've definitely worked myself in a corner tonight where I'm not sure any more what a masked array is good for in the first place... I'll stop for the day! All the best, Marten
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion