[Numpy-discussion] Re: Replace nan-ignoring functions with an ignore_nan flag to their normal counterparts

Ralf Gommers via NumPy-Discussion Thu, 23 Oct 2025 08:03:04 -0700

On Thu, Oct 23, 2025 at 2:18 PM Marten van Kerkwijk via NumPy-Discussion <
[email protected]> wrote:


> Hi Ralf,
>
> > So I think the relevant choices are:
> > 1. Change nothing to the current status quo (and possibly direct end
> users who need more than
> > what we offer now to `marray`)
> > 2. Add a keyword to reductions
> > 3. Add a single factory function that turns regular reductions into
> nan-aware ones (as in
> >
> https://github.com/data-apis/array-api/issues/621#issuecomment-1553481118)
> >
> > I think (1) is also a very reasonable outcome if we don't like any of
> the alternatives.
>
> I am fine with (1), continue to dislike (2), and like (3).
>
> On (1) [status quo], you mentioned that nanptp was rejected earlier as a
> new addition to nanfunctions.  If this was because we didn't want to
> expand the main numpy namespace (reasonable!),


Indeed. There perhaps also was a "this is too niche anyway" thought, but
IIRC not polluting the main namespace was the primary consideration.


> might a sub-option be to
> allow expansion in nanfunctions for any regular function in the numpy
> namespace, but only expose them in nanfunctions itself?  An advantage
> would be that, effectively, those who like to omit NaN could just do
> "import numpy.lib.nanfunctions as np".


I'll note that that is not a public namespace right now. It could be
created of course, if there is energy.


> Of course, at that point perhaps
> one should just bite the bullet and move nanfunctions out to its own
> package...
>

Like https://github.com/pydata/bottleneck? It already has faster
nan-functions as well as some extra ones (anynan, allnan, nanrankdata). Of
course it's been on life support for a while, but it's in decent shape.

On (2) [keyword argument], I continue to dislike the idea of adding new
> keyword arguments for the ufunc reductions -- ufuncs are one of the few
> bits of numpy API that are really nicely clean and consistent between
> many functions.  We have been very careful about extending it, and
> keeping it light.  They already allow `np.sum(data, where=~isnan(data)`,
> it is not obvious why we would add another option to do the same thing.
> Obviously, one could argue that np.sum != np.add.reduce, so their
> signatures can diverge, but I'd personally like to move in the opposite
> direction (if only for speed for small arrays).
>

Fair enough.


> On (3) [factory function], I think a side benefit is that it is the
> lightest possible way to make useful what is required anyway, creating
> wrappers/implementations for functions not yet covered by nanfunctions.
>

That "lightest possible way" is why I suggested it indeed - but it seems
not many people shared my preference for that option.

My suggestion of a nan-as-omit Array API compatible wrapper class would
> need them, and so would extending nanfunctions to cover more cases.
> Indeed, it would even help the keyword-argument case as it would provide
> working implementations.
>
> Let me also mention again another option, of a wrapper data type which
> translates floats with NaN to a floats with nan replaced by an
> appropriate constant (identify from reductions by default).


I think you can't determine an appropriate value without already doing the
nan-omitting calculation? E.g. what replacement value would you use for
`np.mean`?


To
> opt in, one would do something like,
>
> function(array.astype(NaNOmittingFloat), ...)
>
> But really one could initialize arrays like that and just keep working
> with them.  Of course, this would rely completely on Sebastian's custom
> dtype mechanism, which has already proven its worth in StringDType, but
> which would likely not be recognized by other array classes.  For that,
> a custom array class would be best (though given marray that may
> actually not be much work at all -- just need to have the mask always
> inferred instead of kept as a separate array).
>
> All the best,
>
> Marten
>
> p.s.  I liked the little summary of what other languages do in
> https://github.com/data-apis/array-api/issues/621#issuecomment-1569485778
> Julia's seemed a nice functional approach -- it seems a very interesting
> language in general, from which it is probably worth getting more ideas...
>

Agreed, Julia has some nice ideas.

Cheers,
Ralf

_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Re: Replace nan-ignoring functions with an ignore_nan flag to their normal counterparts

Reply via email to