Hi Carlos, Indeed, the implementation for making NaN mean "omit" for some functions is not too difficult now. Your example actually leads to the opposite conclusion about ease, though, as you really should count also the implementation of ``where`` (I added it... with one of the planned goals the use in nanfunctions...). This makes it simple now, but under the hood the reductions have to take a different path if where is present (see code in `umath/reduction.c`). So overall supporting NaN as missing is actually not simple even for ``sum``, and I am fairly certain the same will hold generally.
Now I can see why you would dislike creating a new class, as it adds complexity. But in the end simplicity holds for functions too: if I write code to deal with an array of floats, it is far more simple if I can treat the elements as standard floats, with the standard meaning of it as Not a Number. It also keeps maintenance of those functions simpler, with fewer tests for fewer combinations of arguments, and helps standardization between different array types: If we were to go your route, *every* array implementation has to start supporting treating NaN in different ways. (And why stop there? IIRC, pandas uses the most negative int to signal a masked value; should we start supporting that too?) Now another way of thinking is that the array should be the same, but it needs to be explicit about how its data is interpreted, i.e., signal that it wants NaN treated as missing. That does not necessarily require a new array class, but may be possible by creating a new data type, which wraps a regular float. Conceptually, though, that requires creating new float loops for every ufunc for which this may matter, so again not simple. Finally, I note that in the data api issue you quote: > It is better to have 100 functions operate on one data structure than 10 > functions on 10 data structures. But the obvious answer to that is that, in fact, numpy does exactly that by providing the nanfunctions. There is nothing stopping you from using those functions all the time, even when arrays may not have `NaN`. Indeed, in a way my suggested new NanMask Array API compatible class would just bundle those nanfunctions in a more convenient package... Anyway, in the end I think all appeaches will end up essentially costing the same amount of effort, and I think for a relatively niche case of using NaN as masks, one should pick one that does not require changes to the base numpy implementations. All the best, Marten "Carlos Martin" <[email protected]> writes: >> The costs I worry about are performance and increased maintenance burden for >> the regular, no-nan case. For instance, the "obvious" way to implement a >> nan-omitting sum would be to check inside a loop whether any given element >> was nan, thus slowing down the regular case (e.g., by breaking >> vectorization). To avoid this one has to be careful, thus making code >> harder to write, more fragile, and more difficult to maintain (analogous to >> -- but worse than -- tracking floating point errors). > > I'm not sure I understand your objection here. Consider the way `nansum` is > currently implemented: > https://github.com/numpy/numpy/blob/76e91189b23d4e0afc34130e95f4f460a3d57d95/numpy/lib/_nanfunctions_impl.py#L725. > >> a, mask = _replace_nan(a, 0) >> return np.sum(a, axis=axis, dtype=dtype, out=out, keepdims=keepdims, >> initial=initial, where=where) > > The `ignore_nan` version would simply do the same thing, but inside the body > of `numpy.sum`. Or it can call `np.sum` with `where=~np.isnan(a) if where is > None else ~np.isnan(a) & where` (i.e., combining with any mask the user > supplies). > > I object to the approach of complicating the array ontology, for the reasons > described here: > https://github.com/data-apis/array-api/issues/621#issuecomment-3433986363. > _______________________________________________ > NumPy-Discussion mailing list -- [email protected] > To unsubscribe send an email to [email protected] > https://mail.python.org/mailman3//lists/numpy-discussion.python.org > Member address: [email protected] _______________________________________________ NumPy-Discussion mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: [email protected]
