On 6/24/19 3:09 PM, Marten van Kerkwijk wrote: > Hi Allan, > > Thanks for bringing up the noclobber explicitly (and Stephan for asking > for clarification; I was similarly confused). > > It does clarify the difference in mental picture. In mine, the operation > would indeed be guaranteed to be done on the underlying data, without > copy and without `.filled(...)`. I should clarify further that I use > `where` only to skip reading elements (i.e., in reductions), not writing > elements (as you mention, the unwritten element will often be nonsense - > e.g., wrong units - which to me is worse than infinity or something > similar; I've not worried at all about runtime warnings). Note that my > main reason here is not that I'm against filling with numbers for > numerical arrays, but rather wanting to make minimal assumptions about > the underlying data itself. This may well be a mistake (but I want to > find out where it breaks). > > Anyway, it would seem in many ways all the better that our models are > quite different. I definitely see the advantages of your choice to > decide one can do with masked data elements whatever is logical ahead of > an operation! > > Thanks also for bringing up a useful example with `np.dot(m, m)` - > clearly, I didn't yet get beyond overriding ufuncs! > > In my mental model, where I'd apply `np.dot` on the data and the mask > separately, the result will be wrong, so the mask has to be set (which > it would be). For your specific example, that might not be the best > solution, but when using `np.dot(matrix_shaped, matrix_shaped)`, I think > it does give the correct masking: any masked element in a matrix better > propagate to all parts that it influences, even if there is a reduction > of sorts happening. So, perhaps a price to pay for a function that tries > to do multiple things. > > The alternative solution in my model would be to replace `np.dot` with a > masked-specific implementation of what `np.dot` is supposed to stand for > (in your simple example, `np.add.reduce(np.multiply(m, m))` - more > generally, add relevant `outer` and `axes`). This would be similar to > what I think all implementations do for `.mean()` - we cannot calculate > that from the data using any fill value or skipping, so rather use a > more easily cared-for `.sum()` and divide by a suitable number of > elements. But in both examples the disadvantage is that we took away the > option to use the underlying class's `.dot()` or `.mean()` implementations.
Just to note, my current implementation uses the IGNORE style of mask, so does not propagate the mask in np.dot: >>> a = MaskedArray([[1,1,1], [1,X,1], [1,1,1]]) >>> np.dot(a, a) MaskedArray([[3, 2, 3], [2, 2, 2], [3, 2, 3]]) I'm not at all set on that behavior and we can do something else. For now, I chose this way since it seemed to best match the "IGNORE" mask behavior. The behavior you described further above where the output row/col would be masked corresponds better to "NA" (propagating) mask behavior, which I am leaving for later implementation. best, Allan > > (Aside: considerations such as these underlie my longed-for exposure of > standard implementations of functions in terms of basic ufunc calls.) > > Another example of a function for which I think my model is not > particularly insightful (and for which it is difficult to know what to > do generally) is `np.fft.fft`. Since an fft is equivalent to a > sine/cosine fits to data points, the answer for masked data is in > principle quite well-defined. But much less easy to implement! > > All the best, > > Marten > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion