On Tue, Apr 20, 2021 at 2:18 PM Matti Picus <matti.pi...@gmail.com> wrote:
> I have submitted NEP 49 to enable user-defined allocation strategies for > the ndarray.data homogeneous memory area. The implementation is in PR > 17582 https://github.com/numpy/numpy/pull/17582 Here is the text of the > NEP: > Thanks Matti! > > Abstract > -------- > > The ``numpy.ndarray`` requires additional memory allocations > to hold ``numpy.ndarray.strides``, ``numpy.ndarray.shape`` and > ``numpy.ndarray.data`` attributes. These attributes are specially allocated > after creating the python object in ``__new__`` method. The ``strides`` and > ``shape`` are stored in a piece of memory allocated internally. > > This NEP proposes a mechanism to override the memory management strategy > used > for ``ndarray->data`` with user-provided alternatives. This allocation > holds > the arrays data and is can be very large. As accessing this data often > becomes > a performance bottleneck, custom allocation strategies to guarantee data > alignment or pinning allocations to specialized memory hardware can enable > hardware-specific optimizations. > > Motivation and Scope > -------------------- > > Users may wish to override the internal data memory routines with ones > of their > own. Two such use-cases are to ensure data alignment and to pin certain > allocations to certain NUMA cores. > It would be great to expand a bit on these two sentences, and add some links. There's a lot of history here in NumPy development to refer to as well: https://numpy-discussion.scipy.narkive.com/MvmMkJcK/numpy-arrays-data-allocation-and-simd-alignement http://numpy-discussion.10968.n7.nabble.com/Aligned-configurable-memory-allocation-td39712.html http://numpy-discussion.10968.n7.nabble.com/Numpy-s-policy-for-releasing-memory-td1533.html https://github.com/numpy/numpy/issues/5312 https://github.com/numpy/numpy/issues/14177 There must also be a good amount of ideas/discussion elsewhere. https://bugs.python.org/issue18835 discussed an aligned allocator for Python itself, with fairly detailed discussion about whether/how NumPy could benefit. With (I think) the conclusion it shouldn't be in Python, but NumPy/Arrow/others are better off doing their own thing. I'm wondering if improved memory profiling is a use case as well? Fil ( https://github.com/pythonspeed/filprofiler) for example seems to use such a strategy: https://github.com/pythonspeed/filprofiler/blob/master/design/allocator-overrides.md Does it interact with our tracemalloc support ( https://numpy.org/doc/stable/release/1.13.0-notes.html#support-for-tracemalloc-in-python-3-6 )? > User who wish to change the NumPy data memory management routines will use > This is design, not motivation or scope. Try to not refer to specific function names in this section. I suggest moving this content to the "Detailed design" section (or better, a "high level design" at the start of that section). Cheers, Ralf :c:func:`PyDataMem_SetHandler`, which uses a :c:type:`PyDataMem_Handler` > structure to hold pointers to functions used to manage the data memory. The > calls are wrapped by internal routines to call > :c:func:`PyTraceMalloc_Track`, > :c:func:`PyTraceMalloc_Untrack`, and will use the > :c:func:`PyDataMem_EventHookFunc` mechanism already present in NumPy for > auditing purposes. > > Since a call to ``PyDataMem_SetHandler`` will change the default > functions, but > that function may be called during the lifetime of an ``ndarray`` > object, each > ``ndarray`` will carry with it the ``PyDataMem_Handler`` struct used at the > time of its instantiation, and these will be used to reallocate or free the > data memory of the instance. Internally NumPy may use ``memcpy` or > ``memset`` > on the data ``ptr``. > > Usage and Impact > ---------------- > > The new functions can only be accessed via the NumPy C-API. An example is > included later in the NEP. The added ``struct`` will increase the size > of the > ``ndarray`` object. It is one of the major drawbacks of this approach. > We can > be reasonably sure that the change in size will have a minimal impact on > end-user code because NumPy version 1.20 already changed the object size. > > Backward compatibility > ---------------------- > > The design will not break backward compatibility. Projects that were > assigning > to the ``ndarray->data`` pointer were already breaking the current memory > management strategy (backed by ``npy_alloc_cache``) and should restore > ``ndarray->data`` before calling ``Py_DECREF``. As mentioned above, the > change > in size should not impact end-users. > > Matti > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion