Hi all,

with the latest changes, NumPy has a completely revised infrastructure
for:

* Creating a new array from Python objects
* Organizing casting (to align almost fully with ufuncs)
* Organize the inner-loops and dispatching of ufuncs
* Create and "promote" DTypes

Which means that the core functionality of the DType NEPs is
implemented:

* https://numpy.org/neps/nep-0041-improved-dtype-support.html
* https://numpy.org/neps/nep-0042-new-dtypes.html
* https://numpy.org/neps/nep-0043-extensible-ufuncs.html

This is an important milestone and it allows user-defined DTypes (and
ufuncs) with functionality beyond what was previously possible.  E.g.
see the examples in: https://github.com/seberg/experimental_user_dtypes

Current user-implemented dtypes should already be able to achieve
better integration by moving towards the new system (experimentally!).
Many dtypes (categoricals, units, datetimes, etc.) that were previously
not possible can now be implemented as well!


Future Work
-----------

The work is by no means complete, but the quality of the next steps
should change: from a focus on large and careful refactors towards
small changes that often unlock new features for user DTypes!

It also means that "feature requests" coming from users testing out the
new boundaries would be extremely helpful!  The new API pushes
boundaries and we have to map out what this means :).

One larger remaining change is that we should slowly modernize NumPy's
own ufuncs to utilize the new API within NumPy itself;  though this is
not necessarily urgent.

Here is a list of things of various importance and urgency:
* Moving "legacy" implementations of certain `PyArray_ArrayFuncs`
  
(https://numpy.org/devdocs/reference/c-api/types-and-structures.html#c.PyArray_ArrFuncs)
  slots to a new API on the DType.
  This can be done incrementally, and we can ask users to keep
  using the old API and transition to the new API when it is ready. 
  Some examples are `nonzero`, sorting functions, or `copyswapn` which
  are still used within NumPy.
  (NumPy should add new API for them, and "fill" the slots with generic
  implementations where possible.)
* Some user DTypes will want to use "references" (data being allocated
  for each element) which requires cleanup (e.g. `free`, `Py_DECREF`).
  There are two potential solutions:
  - Refactor the reference counting in NumPy (would be good in
    any case)
  - Add support for DTypes that use Python objects as their "storage"
    and reuse the current code.
* Physical units will want to conveniently re-use existing NumPy ufunc
  loops (math functions).  The general infrastructure supports this,
  but API needs to be added to permit it and make it easy.
* There are many smaller things, for example:
  - Allow a user DType without a python scalar (similar to astropy's
    Quantity), for which `arr1d[0]` returns a 0-D array.
  - A few tweaks to the current API (floating point errors and views)
* We could now "fix" `dtype="S"` to mean a string with undefined length
  and reject `np.dtype("S")` but allow `np.array([1, 2], dtype="S")`.
* I would also like to improve alignment tracking and handling (which
  is interesting for the public and private UFunc and Casting API).

Generally, the API needs to be finalized and exposed, since this is
currently only experimental with certain changes expected:
https://github.com/numpy/numpy/blob/main/numpy/core/include/numpy/experimental_dtype_api.h


Value-based casting/promotion
-----------------------------

One major difficulty (and cause of inconsistencies), is the use of
value-based casting:

    arr = np.array([1, 2, 3, 4], dtype=np.uint8)
    arr + 5  # result is `uint8`
    arr - 255  # result is `int16`

We do not wish to support this for user DTypes, but we may want to
support "weak promotion", where the user DType knows that the other
operand was a Python integer, float, or complex and can "downpromote"
it if desired.  In the above example, the first would still work and
the second should error or warn.

To some degree, this is an extension of existing logic and it is mostly
implemented.  However, it is not yet integrated into the ufuncs for use
with user DTypes.  It would be helpful to change NumPy's behavior, but
that would likely be a major version change.

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to