Hi All, I do not like the idea of having a new ``nan_policy`` or ``ignore_nan`` keyword argument, as I think the cost-benefit ratio is poor.
The costs I worry about are performance and increased maintenance burden for the regular, no-nan case. For instance, the "obvious" way to implement a nan-omitting sum would be to check inside a loop whether any given element was nan, thus slowing down the regular case (e.g., by breaking vectorization). To avoid this one has to be careful, thus making code harder to write, more fragile, and more difficult to maintain (analogous to -- but worse than -- tracking floating point errors). To me, the benefits seem small in comparison. This partially because in thinking about masked arrays I realized that skipping elements is not that often the right idea: e.g., it makes sense for taking a mean or median, but less so for summing, as the result is pretty meaningless unless one knows how many elements went in. (The logic is more obvious when one considers arrays of physical quantities; hence, the masked array class I wrote for astropy just propagates masks by default; see reasoning at https://docs.astropy.org/en/latest/utils/masked/) Trying to understand the use case for allowing to omit nan instead of propagating it, it would seem that the main one is in fact to have a simple, implicit masked array. But maybe it is OK to be explicit and use ``MaskedArray(data, mask=np.isnan(data))``? With a helper function along the lines of ``np.ma.masked_invalid`` (but only masking nan), one would do ``function(mask_nans(data), ...)``, which does not seem much worse than ``function(data, ..., nan_policy='omit')``. Continuing the analogy with masked arrays, if one dislikes the idea of carrying an explicit mask, perhaps there should be a new class that uses nan explicitly as a mask marker? That certainly is possible, overriding numpy functions with ``__array_ufunc__`` and ``__array_function__`` to change their beviour accordingly (or use the Array API). This would seem a good project for a new module outside of numpy (one where eventually the nanfunctions could be moved??). Obviously, writing a new class is not little work, but that is partially my point: doing this right with a keyword argument would be no less work, and, once in place, would hinder development for the regular case. Overall, I think we should keep the nan-aware stuff separate, to be developed and taken care of by those who want to use it. All the best, Marten p.s. Implementation-wise, at least for ufunc reductions it could be fairly straightforward, as those already have a ``where`` argument that can be used for the ``nan_policy='omit'` option (which selects a different loop -- I had to take care when implementing that!). Indeed, we had an initial trial rewriting the nanfunctions to use this in https://github.com/numpy/numpy/pull/12801 One could also imagine a custom float dtype that is nan-aware, promoting nan to the identity value as appropriate. But the easiest route would still seem to write a new array class that behaves slightly differently. The required override functions can basically be taken from astropy's Masked class. Ralf Gommers via NumPy-Discussion <[email protected]> writes: > On Wed, Oct 22, 2025 at 5:19 AM matti picus via NumPy-Discussion > <[email protected]> wrote: > > On Wed, Oct 22, 2025 at 6:04 AM Carlos Martin <[email protected]> > wrote: > > NumPy has the following nan-ignoring functions: > > ... > Suggestion: Replace these functions with an ignore_nan flag to their normal > counterparts. This avoids having entirely separate functions just to filter > out nans, and > shrinks the size of the codebase. It is also more user-friendly: it is > simpler and easier > to toggle a boolean flag. > > This has been briefly suggested before: > > - > > https://mail.python.org/archives/list/[email protected]/message/FQ362NGJLOJFN3BCJVST5TAQZCVWZTNO/ > > - https://github.com/numpy/numpy/pull/25474#issuecomment-1868484678 > > As mentioned in links, SciPy has a well-defined nan_policy kwarg. If we are > to change this I > think we should discuss adopting that policy > https://docs.scipy.org/doc/scipy/dev/api-dev/nan_policy.html. > > I think that's the direction that > https://github.com/data-apis/array-api/issues/621 is leaning in > as well. > > We may need to keep the current functions around for a while as aliases. I > am not sure > about the advantages other than consistency with SciPy (which is no small > thing). I doubt it > will shrink the size of the codebase significantly, and using the named > functions explicitly > is just as easy as adding a kwarg. It might however be easier to teach. > > The main advantage is that we can add that same keyword to more niche > reduction functions > that we currently don't want as separate functions because it bloats the > namespace too much > for limited gain. See for example https://github.com/numpy/numpy/issues/13198 > where we > rejected adding `nanptp`. A keyword is much more lightweight so won't run > into the same > objection. > > Cheers, > Ralf _______________________________________________ NumPy-Discussion mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: [email protected]
