Hi Carlos,

Indeed, the implementation for making NaN mean "omit" for some functions
is not too difficult now.  Your example actually leads to the opposite
conclusion about ease, though, as you really should count also the
implementation of ``where`` (I added it... with one of the planned goals
the use in nanfunctions...).  This makes it simple now, but under the
hood the reductions have to take a different path if where is present
(see code in `umath/reduction.c`).  So overall supporting NaN as missing
is actually not simple even for ``sum``, and I am fairly certain the
same will hold generally.

Now I can see why you would dislike creating a new class, as it adds
complexity.  But in the end simplicity holds for functions too: if I
write code to deal with an array of floats, it is far more simple if I
can treat the elements as standard floats, with the standard meaning of
it as Not a Number.

It also keeps maintenance of those functions simpler, with fewer tests
for fewer combinations of arguments, and helps standardization between
different array types: If we were to go your route, *every* array
implementation has to start supporting treating NaN in different ways.
(And why stop there?  IIRC, pandas uses the most negative int to signal
a masked value; should we start supporting that too?)

Now another way of thinking is that the array should be the same, but it
needs to be explicit about how its data is interpreted, i.e., signal
that it wants NaN treated as missing.  That does not necessarily require
a new array class, but may be possible by creating a new data type,
which wraps a regular float.  Conceptually, though, that requires
creating new float loops for every ufunc for which this may matter, so
again not simple.

Finally, I note that in the data api issue you quote:

> It is better to have 100 functions operate on one data structure than 10 
> functions on 10 data structures.

But the obvious answer to that is that, in fact, numpy does exactly that
by providing the nanfunctions.  There is nothing stopping you from using
those functions all the time, even when arrays may not have `NaN`.
Indeed, in a way my suggested new NanMask Array API compatible class
would just bundle those nanfunctions in a more convenient package...

Anyway, in the end I think all appeaches will end up essentially costing
the same amount of effort, and I think for a relatively niche case of
using NaN as masks, one should pick one that does not require changes to
the base numpy implementations.

All the best,

Marten



"Carlos Martin" <[email protected]> writes:

>> The costs I worry about are performance and increased maintenance burden for 
>> the regular, no-nan case.  For instance, the "obvious" way to implement a 
>> nan-omitting sum would be to check inside a loop whether any given element 
>> was nan, thus slowing down the regular case (e.g., by breaking 
>> vectorization).  To avoid this one has to be careful, thus making code 
>> harder to write, more fragile, and more difficult to maintain (analogous to 
>> -- but worse than -- tracking floating point errors).
>
> I'm not sure I understand your objection here. Consider the way `nansum` is 
> currently implemented: 
> https://github.com/numpy/numpy/blob/76e91189b23d4e0afc34130e95f4f460a3d57d95/numpy/lib/_nanfunctions_impl.py#L725.
>
>> a, mask = _replace_nan(a, 0)
>> return np.sum(a, axis=axis, dtype=dtype, out=out, keepdims=keepdims, 
>> initial=initial, where=where)
>
> The `ignore_nan` version would simply do the same thing, but inside the body 
> of `numpy.sum`. Or it can call `np.sum` with `where=~np.isnan(a) if where is 
> None else ~np.isnan(a) & where` (i.e., combining with any mask the user 
> supplies).
>
> I object to the approach of complicating the array ontology, for the reasons 
> described here: 
> https://github.com/data-apis/array-api/issues/621#issuecomment-3433986363.
> _______________________________________________
> NumPy-Discussion mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3//lists/numpy-discussion.python.org
> Member address: [email protected]
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

Reply via email to