I have submitted NEP 49 to enable user-defined allocation strategies for
the ndarray.data homogeneous memory area. The implementation is in PR
17582 https://github.com/numpy/numpy/pull/17582 Here is the text of the NEP:
Abstract
--------
The ``numpy.ndarray`` requires additional memory allocations
to hold ``numpy.ndarray.strides``, ``numpy.ndarray.shape`` and
``numpy.ndarray.data`` attributes. These attributes are specially allocated
after creating the python object in ``__new__`` method. The ``strides`` and
``shape`` are stored in a piece of memory allocated internally.
This NEP proposes a mechanism to override the memory management strategy
used
for ``ndarray->data`` with user-provided alternatives. This allocation holds
the arrays data and is can be very large. As accessing this data often
becomes
a performance bottleneck, custom allocation strategies to guarantee data
alignment or pinning allocations to specialized memory hardware can enable
hardware-specific optimizations.
Motivation and Scope
--------------------
Users may wish to override the internal data memory routines with ones
of their
own. Two such use-cases are to ensure data alignment and to pin certain
allocations to certain NUMA cores.
User who wish to change the NumPy data memory management routines will use
:c:func:`PyDataMem_SetHandler`, which uses a :c:type:`PyDataMem_Handler`
structure to hold pointers to functions used to manage the data memory. The
calls are wrapped by internal routines to call
:c:func:`PyTraceMalloc_Track`,
:c:func:`PyTraceMalloc_Untrack`, and will use the
:c:func:`PyDataMem_EventHookFunc` mechanism already present in NumPy for
auditing purposes.
Since a call to ``PyDataMem_SetHandler`` will change the default
functions, but
that function may be called during the lifetime of an ``ndarray``
object, each
``ndarray`` will carry with it the ``PyDataMem_Handler`` struct used at the
time of its instantiation, and these will be used to reallocate or free the
data memory of the instance. Internally NumPy may use ``memcpy` or
``memset``
on the data ``ptr``.
Usage and Impact
----------------
The new functions can only be accessed via the NumPy C-API. An example is
included later in the NEP. The added ``struct`` will increase the size
of the
``ndarray`` object. It is one of the major drawbacks of this approach.
We can
be reasonably sure that the change in size will have a minimal impact on
end-user code because NumPy version 1.20 already changed the object size.
Backward compatibility
----------------------
The design will not break backward compatibility. Projects that were
assigning
to the ``ndarray->data`` pointer were already breaking the current memory
management strategy (backed by ``npy_alloc_cache``) and should restore
``ndarray->data`` before calling ``Py_DECREF``. As mentioned above, the
change
in size should not impact end-users.
Matti
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion