Hi All,

I do not like the idea of having a new ``nan_policy`` or ``ignore_nan``
keyword argument, as I think the cost-benefit ratio is poor.

The costs I worry about are performance and increased maintenance burden
for the regular, no-nan case.  For instance, the "obvious" way to
implement a nan-omitting sum would be to check inside a loop whether any
given element was nan, thus slowing down the regular case (e.g., by
breaking vectorization).  To avoid this one has to be careful, thus
making code harder to write, more fragile, and more difficult to
maintain (analogous to -- but worse than -- tracking floating point
errors).

To me, the benefits seem small in comparison. This partially because in
thinking about masked arrays I realized that skipping elements is not
that often the right idea: e.g., it makes sense for taking a mean or
median, but less so for summing, as the result is pretty meaningless
unless one knows how many elements went in.  (The logic is more obvious
when one considers arrays of physical quantities; hence, the masked
array class I wrote for astropy just propagates masks by default; see
reasoning at https://docs.astropy.org/en/latest/utils/masked/)

Trying to understand the use case for allowing to omit nan instead of
propagating it, it would seem that the main one is in fact to have a
simple, implicit masked array.  But maybe it is OK to be explicit and
use ``MaskedArray(data, mask=np.isnan(data))``?  With a helper function
along the lines of ``np.ma.masked_invalid`` (but only masking nan),
one would do ``function(mask_nans(data), ...)``, which does not seem
much worse than ``function(data, ..., nan_policy='omit')``.

Continuing the analogy with masked arrays, if one dislikes the idea of
carrying an explicit mask, perhaps there should be a new class that uses
nan explicitly as a mask marker? That certainly is possible, overriding
numpy functions with ``__array_ufunc__`` and ``__array_function__`` to
change their beviour accordingly (or use the Array API).  This would
seem a good project for a new module outside of numpy (one where
eventually the nanfunctions could be moved??).

Obviously, writing a new class is not little work, but that is partially
my point: doing this right with a keyword argument would be no less
work, and, once in place, would hinder development for the regular case.

Overall, I think we should keep the nan-aware stuff separate, to be
developed and taken care of by those who want to use it.

All the best,

Marten

p.s. Implementation-wise, at least for ufunc reductions it could be
fairly straightforward, as those already have a ``where`` argument that
can be used for the ``nan_policy='omit'` option (which selects a
different loop -- I had to take care when implementing that!).  Indeed,
we had an initial trial rewriting the nanfunctions to use this in

https://github.com/numpy/numpy/pull/12801

One could also imagine a custom float dtype that is nan-aware, promoting
nan to the identity value as appropriate.

But the easiest route would still seem to write a new array class that
behaves slightly differently.  The required override functions can
basically be taken from astropy's Masked class.


Ralf Gommers via NumPy-Discussion <[email protected]> writes:

> On Wed, Oct 22, 2025 at 5:19 AM matti picus via NumPy-Discussion
> <[email protected]> wrote:
>
>  On Wed, Oct 22, 2025 at 6:04 AM Carlos Martin <[email protected]> 
> wrote:
>
>  NumPy has the following nan-ignoring functions:
>
>  ...
>  Suggestion: Replace these functions with an ignore_nan flag to their normal
>  counterparts. This avoids having entirely separate functions just to filter 
> out nans, and
>  shrinks the size of the codebase. It is also more user-friendly: it is 
> simpler and easier
>  to toggle a boolean flag.
>
>  This has been briefly suggested before:
>
>  -
>  
> https://mail.python.org/archives/list/[email protected]/message/FQ362NGJLOJFN3BCJVST5TAQZCVWZTNO/
>  
>  - https://github.com/numpy/numpy/pull/25474#issuecomment-1868484678
>
>  As mentioned in links, SciPy has a well-defined nan_policy kwarg. If we are 
> to change this I
>  think we should discuss adopting that policy
>  https://docs.scipy.org/doc/scipy/dev/api-dev/nan_policy.html. 
>
> I think that's the direction that 
> https://github.com/data-apis/array-api/issues/621 is leaning in
> as well.
>  
>  We may need to keep the current functions around for a while as aliases. I 
> am not sure
>  about the advantages other than consistency with SciPy (which is no small 
> thing). I doubt it
>  will shrink the size of the codebase significantly, and using the named 
> functions explicitly
>  is just as easy as adding a kwarg. It might however be easier to teach.
>
> The main advantage is that we can add that same keyword to more niche 
> reduction functions
> that we currently don't want as separate functions because it bloats the 
> namespace too much
> for limited gain. See for example https://github.com/numpy/numpy/issues/13198 
> where we
> rejected adding `nanptp`. A keyword is much more lightweight so won't run 
> into the same
> objection.
>
> Cheers,
> Ralf
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

Reply via email to