This means that ndarray needs to know about ufuncs – so instead of a clean layering, we have a circular dependency.
Perhaps we should split ndarray into a base_ndarray class with no arithmetic support (*add*, sum, etc), and then provide an ndarray subclass from umath instead (either the separate extension, or just a different set of files) On Thu, 8 Mar 2018 at 08:25 Nathaniel Smith <n...@pobox.com> wrote: > Hi all, > > Well, this is something that we've discussed for a while and I think > generally has consensus already, but I figured I'd write it down > anyway to make sure. > > There's a rendered version here: > > https://github.com/njsmith/numpy/blob/nep-0015-merge-multiarray-umath/doc/neps/nep-0015-merge-multiarray-umath.rst > > ----- > > ============================ > Merging multiarray and umath > ============================ > > :Author: Nathaniel J. Smith <n...@pobox.com> > :Status: Draft > :Type: Standards Track > :Created: 2018-02-22 > > > Abstract > -------- > > Let's merge ``numpy.core.multiarray`` and ``numpy.core.umath`` into a > single extension module, and deprecate ``np.set_numeric_ops``. > > > Background > ---------- > > Currently, numpy's core C code is split between two separate extension > modules. > > ``numpy.core.multiarray`` is built from > ``numpy/core/src/multiarray/*.c``, and contains the core array > functionality (in particular, the ``ndarray`` object). > > ``numpy.core.umath`` is built from ``numpy/core/src/umath/*.c``, and > contains the ufunc machinery. > > These two modules each expose their own separate C API, accessed via > ``import_multiarray()`` and ``import_umath()`` respectively. The idea > is that they're supposed to be independent modules, with > ``multiarray`` as a lower-level layer with ``umath`` built on top. In > practice this has turned out to be problematic. > > First, the layering isn't perfect: when you write ``ndarray + > ndarray``, this invokes ``ndarray.__add__``, which then calls the > ufunc ``np.add``. This means that ``ndarray`` needs to know about > ufuncs – so instead of a clean layering, we have a circular > dependency. To solve this, ``multiarray`` exports a somewhat > terrifying function called ``set_numeric_ops``. The bootstrap > procedure each time you ``import numpy`` is: > > 1. ``multiarray`` and its ``ndarray`` object are loaded, but > arithmetic operations on ndarrays are broken. > > 2. ``umath`` is loaded. > > 3. ``set_numeric_ops`` is used to monkeypatch all the methods like > ``ndarray.__add__`` with objects from ``umath``. > > In addition, ``set_numeric_ops`` is exposed as a public API, > ``np.set_numeric_ops``. > > Furthermore, even when this layering does work, it ends up distorting > the shape of our public ABI. In recent years, the most common reason > for adding new functions to ``multiarray``\'s "public" ABI is not that > they really need to be public or that we expect other projects to use > them, but rather just that we need to call them from ``umath``. This > is extremely unfortunate, because it makes our public ABI > unnecessarily large, and since we can never remove things from it then > this creates an ongoing maintenance burden. The way C works, you can > have internal API that's visible to everything inside the same > extension module, or you can have a public API that everyone can use; > you can't have an API that's visible to multiple extension modules > inside numpy, but not to external users. > > We've also increasingly been putting utility code into > ``numpy/core/src/private/``, which now contains a bunch of files which > are ``#include``\d twice, once into ``multiarray`` and once into > ``umath``. This is pretty gross, and is purely a workaround for these > being separate C extensions. > > > Proposed changes > ---------------- > > This NEP proposes three changes: > > 1. We should start building ``numpy/core/src/multiarray/*.c`` and > ``numpy/core/src/umath/*.c`` together into a single extension > module. > > 2. Instead of ``set_numeric_ops``, we should use some new, private API > to set up ``ndarray.__add__`` and friends. > > 3. We should deprecate, and eventually remove, ``np.set_numeric_ops``. > > > Non-proposed changes > -------------------- > > We don't necessarily propose to throw away the distinction between > multiarray/ and umath/ in terms of our source code organization: > internal organization is useful! We just want to build them together > into a single extension module. Of course, this does open the door for > potential future refactorings, which we can then evaluate based on > their merits as they come up. > > It also doesn't propose that we break the public C ABI. We should > continue to provide ``import_multiarray()`` and ``import_umath()`` > functions – it's just that now both ABIs will ultimately be loaded > from the same C library. Due to how ``import_multiarray()`` and > ``import_umath()`` are written, we'll also still need to have modules > called ``numpy.core.multiarray`` and ``numpy.core.umath``, and they'll > need to continue to export ``_ARRAY_API`` and ``_UFUNC_API`` objects – > but we can make one or both of these modules be tiny shims that simply > re-export the magic API object from where-ever it's actually defined. > (See ``numpy/core/code_generators/generate_{numpy,ufunc}_api.py`` for > details of how these imports work.) > > > Backward compatibility > ---------------------- > > The only compatibility break is the deprecation of ``np.set_numeric_ops``. > > > Alternatives > ------------ > > n/a > > > Discussion > ---------- > > TBD > > > Copyright > --------- > > This document has been placed in the public domain. > > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion