On Thu, Mar 8, 2018 at 1:25 AM, Nathaniel Smith <n...@pobox.com> wrote:

> Hi all,
>
> Well, this is something that we've discussed for a while and I think
> generally has consensus already, but I figured I'd write it down
> anyway to make sure.
>
> There's a rendered version here:
> https://github.com/njsmith/numpy/blob/nep-0015-merge-
> multiarray-umath/doc/neps/nep-0015-merge-multiarray-umath.rst
>
> -----
>
> ============================
> Merging multiarray and umath
> ============================
>
> :Author: Nathaniel J. Smith <n...@pobox.com>
> :Status: Draft
> :Type: Standards Track
> :Created: 2018-02-22
>
>
> Abstract
> --------
>
> Let's merge ``numpy.core.multiarray`` and ``numpy.core.umath`` into a
> single extension module, and deprecate ``np.set_numeric_ops``.
>
>
> Background
> ----------
>
> Currently, numpy's core C code is split between two separate extension
> modules.
>
> ``numpy.core.multiarray`` is built from
> ``numpy/core/src/multiarray/*.c``, and contains the core array
> functionality (in particular, the ``ndarray`` object).
>
> ``numpy.core.umath`` is built from ``numpy/core/src/umath/*.c``, and
> contains the ufunc machinery.
>
> These two modules each expose their own separate C API, accessed via
> ``import_multiarray()`` and ``import_umath()`` respectively. The idea
> is that they're supposed to be independent modules, with
> ``multiarray`` as a lower-level layer with ``umath`` built on top. In
> practice this has turned out to be problematic.
>
> First, the layering isn't perfect: when you write ``ndarray +
> ndarray``, this invokes ``ndarray.__add__``, which then calls the
> ufunc ``np.add``. This means that ``ndarray`` needs to know about
> ufuncs – so instead of a clean layering, we have a circular
> dependency. To solve this, ``multiarray`` exports a somewhat
> terrifying function called ``set_numeric_ops``. The bootstrap
> procedure each time you ``import numpy`` is:
>
> 1. ``multiarray`` and its ``ndarray`` object are loaded, but
>    arithmetic operations on ndarrays are broken.
>
> 2. ``umath`` is loaded.
>
> 3. ``set_numeric_ops`` is used to monkeypatch all the methods like
>    ``ndarray.__add__`` with objects from ``umath``.
>
> In addition, ``set_numeric_ops`` is exposed as a public API,
> ``np.set_numeric_ops``.
>
> Furthermore, even when this layering does work, it ends up distorting
> the shape of our public ABI. In recent years, the most common reason
> for adding new functions to ``multiarray``\'s "public" ABI is not that
> they really need to be public or that we expect other projects to use
> them, but rather just that we need to call them from ``umath``. This
> is extremely unfortunate, because it makes our public ABI
> unnecessarily large, and since we can never remove things from it then
> this creates an ongoing maintenance burden. The way C works, you can
> have internal API that's visible to everything inside the same
> extension module, or you can have a public API that everyone can use;
> you can't have an API that's visible to multiple extension modules
> inside numpy, but not to external users.
>
> We've also increasingly been putting utility code into
> ``numpy/core/src/private/``, which now contains a bunch of files which
> are ``#include``\d twice, once into ``multiarray`` and once into
> ``umath``. This is pretty gross, and is purely a workaround for these
> being separate C extensions.
>
>
> Proposed changes
> ----------------
>
> This NEP proposes three changes:
>
> 1. We should start building ``numpy/core/src/multiarray/*.c`` and
>    ``numpy/core/src/umath/*.c`` together into a single extension
>    module.
>
> 2. Instead of ``set_numeric_ops``, we should use some new, private API
>    to set up ``ndarray.__add__`` and friends.
>
> 3. We should deprecate, and eventually remove, ``np.set_numeric_ops``.
>
>
> Non-proposed changes
> --------------------
>
> We don't necessarily propose to throw away the distinction between
> multiarray/ and umath/ in terms of our source code organization:
> internal organization is useful! We just want to build them together
> into a single extension module. Of course, this does open the door for
> potential future refactorings, which we can then evaluate based on
> their merits as they come up.
>
> It also doesn't propose that we break the public C ABI. We should
> continue to provide ``import_multiarray()`` and ``import_umath()``
> functions – it's just that now both ABIs will ultimately be loaded
> from the same C library. Due to how ``import_multiarray()`` and
> ``import_umath()`` are written, we'll also still need to have modules
> called ``numpy.core.multiarray`` and ``numpy.core.umath``, and they'll
> need to continue to export ``_ARRAY_API`` and ``_UFUNC_API`` objects –
> but we can make one or both of these modules be tiny shims that simply
> re-export the magic API object from where-ever it's actually defined.
> (See ``numpy/core/code_generators/generate_{numpy,ufunc}_api.py`` for
> details of how these imports work.)
>
>
> Backward compatibility
> ----------------------
>
> The only compatibility break is the deprecation of ``np.set_numeric_ops``.
>
>
> Alternatives
> ------------
>
> n/a
>
>
> Discussion
> ----------
>
> TBD
>
>
> Copyright
> ---------
>
> This document has been placed in the public domain.
>

If we accept this NEP, I'd like to get it done soon, preferably and the
next few months, so that it is finished before we drop Python 2.7 support.
That will make maintenance of the NumPy long term support release through
2019 easier.

Chuck
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to