On Thu, 2020-02-06 at 09:35 -0800, Stephan Hoyer wrote: > On Wed, Feb 5, 2020 at 8:02 AM Andreas Mueller <t3k...@gmail.com> > wrote: <snip> > > > - We use scipy.linalg in many places, and we would need to do a > > separate dispatching to check whether we can use module.linalg > > instead > > (that might be an issue for many libraries but I'm not sure). > > > > This brings up a good question -- obviously the final decision here > is up to SciPy maintainers, but how should we encourage SciPy to > support dispatching? > We could pretty easily make __array_function__ cover SciPy by simply > exposing NumPy's internal utilities. SciPy could simply use the > np.array_function_dispatch decorator internally and that would be > enough.
Hmmm, in NumPy we can easily force basically 100% of (desired) coverage, i.e. JAX can return a namespace that implements everything. With SciPy that is already muss less feasible, and as you go to domain specific tools it seems implausible. `get_array_module` solves the issue of a library that wants to support all array likes. As long as: * most functions rely only on the NumPy API * the domain specific library is expected to implement support for specific array objects if necessary. E.g. sklearn can include special code for Dask support. Dask does not replace sklearn code. > It is less clear how this could work for __array_module__, because > __array_module__ and get_array_module() are not generic -- they > refers explicitly to a NumPy like module. If we want to extend it to > SciPy (for which I agree there are good use-cases), what should that > look __array_module__` I suppose the question is here, where should the code reside? For SciPy, I agree there is a good reason why you may want to "reverse" the implementation. The code to support JAX arrays, should live inside JAX. One, probably silly, option is to return a "global" namespace, so that: np = get_array_module(*arrays).numpy` We have to distinct issues: Where should e.g. SciPy put a generic implementation (assuming they to provide implementations that only require NumPy-API support to not require overriding)? And, also if a library provides generic support, should we define a standard of how the context/namespace may be passed in/provided? sklearn's main namespace is expected to support many array objects/types, but it could be nice to pass in an already known context/namespace (say scikit-image already found it, and then calls scikit-learn internally). A "generic" namespace may even require this to infer the correct output array object. Another thing about backward compatibility: What is our vision there actually? This NEP will *not* give the *end user* the option to opt-in! Here, opt-in is really reserved to the *library user* (e.g. sklearn). (I did not realize this clearly before) Thinking about that for a bit now, that seems like the right choice. But it also means that the library requires an easy way of giving a FutureWarning, to notify the end-user of the upcoming change. The end- user will easily be able to convert to a NumPy array to keep the old behaviour. Once this warning is given (maybe during `get_array_module()`, the array module object/context would preferably be passed around, hopefully even between libraries. That provides a reasonable way to opt-in to the new behaviour without a warning (mainly for library users, end-users can silence the warning if they wish so). - Sebastian > The obvious choices would be to either add a new protocol, e.g., > __scipy_module__ (but then NumPy needs to know about SciPy), or to > add some sort of "module request" parameter to np.get_array_module(), > to indicate the requested API, e.g., np.get_array_module(*arrays, > matching='scipy'). This is pretty similar to the "default" argument > but would need to get passed into the __array_module__ protocol, too. > > > - Some models have several possible optimization algorithms, some > > of which are pure numpy and some which are Cython. If someone > > provides a different array module, > > we might want to choose an algorithm that is actually supported by > > that module. While this exact issue is maybe sklearn specific, a > > similar issue could appear for most downstream libs that use Cython > > in some places. > > Many Cython algorithms could be implemented in pure numpy with a > > potential slowdown, but once we have NEP 37 there might be a > > benefit to having a pure NumPy implementation as an alternative > > code path. > > > > > > Anyway, NEP 37 seems a great step in the right direction and would > > enable sklearn to actually dispatch in some places. Dispatching > > just based on __array_function__ seems not really feasible so far. > > > > Best, > > Andreas Mueller > > > > > > On 1/6/20 11:29 PM, Stephan Hoyer wrote: > > > I am pleased to present a new NumPy Enhancement Proposal for > > > discussion: "NEP-37: A dispatch protocol for NumPy-like modules." > > > Feedback would be very welcome! > > > > > > The full text follows. The rendered proposal can also be found > > > online at https://numpy.org/neps/nep-0037-array-module.html > > > > > > Best, > > > Stephan Hoyer > > > > > > =================================================== > > > NEP 37 — A dispatch protocol for NumPy-like modules > > > =================================================== > > > > > > :Author: Stephan Hoyer <sho...@google.com> > > > :Author: Hameer Abbasi > > > :Author: Sebastian Berg > > > :Status: Draft > > > :Type: Standards Track > > > :Created: 2019-12-29 > > > > > > Abstract > > > -------- > > > > > > NEP-18's ``__array_function__`` has been a mixed success. Some > > > projects (e.g., > > > dask, CuPy, xarray, sparse, Pint) have enthusiastically adopted > > > it. Others > > > (e.g., PyTorch, JAX, SciPy) have been more reluctant. Here we > > > propose a new > > > protocol, ``__array_module__``, that we expect could eventually > > > subsume most > > > use-cases for ``__array_function__``. The protocol requires > > > explicit adoption > > > by both users and library authors, which ensures backwards > > > compatibility, and > > > is also significantly simpler than ``__array_function__``, both > > > of which we > > > expect will make it easier to adopt. > > > > > > Why ``__array_function__`` hasn't been enough > > > --------------------------------------------- > > > > > > There are two broad ways in which NEP-18 has fallen short of its > > > goals: > > > > > > 1. **Maintainability concerns**. `__array_function__` has > > > significant > > > implications for libraries that use it: > > > > > > - Projects like `PyTorch > > > <https://github.com/pytorch/pytorch/issues/22402>`_, `JAX > > > <https://github.com/google/jax/issues/1565>`_ and even > > > `scipy.sparse > > > <https://github.com/scipy/scipy/issues/10362>`_ have been > > > reluctant to > > > implement `__array_function__` in part because they are > > > concerned about > > > **breaking existing code**: users expect NumPy functions > > > like > > > ``np.concatenate`` to return NumPy arrays. This is a > > > fundamental > > > limitation of the ``__array_function__`` design, which we > > > chose to allow > > > overriding the existing ``numpy`` namespace. > > > - ``__array_function__`` currently requires an "all or > > > nothing" approach to > > > implementing NumPy's API. There is no good pathway for > > > **incremental > > > adoption**, which is particularly problematic for > > > established projects > > > for which adopting ``__array_function__`` would result in > > > breaking > > > changes. > > > - It is no longer possible to use **aliases to NumPy > > > functions** within > > > modules that support overrides. For example, both CuPy and > > > JAX set > > > ``result_type = np.result_type``. > > > - Implementing **fall-back mechanisms** for unimplemented > > > NumPy functions > > > by using NumPy's implementation is hard to get right (but > > > see the > > > `version from dask < > > > https://github.com/dask/dask/pull/5043>`_), because > > > ``__array_function__`` does not present a consistent > > > interface. > > > Converting all arguments of array type requires recursing > > > into generic > > > arguments of the form ``*args, **kwargs``. > > > > > > 2. **Limitations on what can be overridden.** > > > ``__array_function__`` has some > > > important gaps, most notably array creation and coercion > > > functions: > > > > > > - **Array creation** routines (e.g., ``np.arange`` and those > > > in > > > ``np.random``) need some other mechanism for indicating what > > > type of > > > arrays to create. `NEP 36 < > > > https://github.com/numpy/numpy/pull/14715>`_ > > > proposed adding optional ``like=`` arguments to functions > > > without > > > existing array arguments. However, we still lack any > > > mechanism to > > > override methods on objects, such as those needed by > > > ``np.random.RandomState``. > > > - **Array conversion** can't reuse the existing coercion > > > functions like > > > ``np.asarray``, because ``np.asarray`` sometimes means > > > "convert to an > > > exact ``np.ndarray``" and other times means "convert to > > > something _like_ > > > a NumPy array." This led to the `NEP 30 > > > <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_ > > > proposal for > > > a separate ``np.duckarray`` function, but this still does > > > not resolve how > > > to cast one duck array into a type matching another duck > > > array. > > > > > > ``get_array_module`` and the ``__array_module__`` protocol > > > ---------------------------------------------------------- > > > > > > We propose a new user-facing mechanism for dispatching to a duck- > > > array > > > implementation, ``numpy.get_array_module``. ``get_array_module`` > > > performs the > > > same type resolution as ``__array_function__`` and returns a > > > module with an API > > > promised to match the standard interface of ``numpy`` that can > > > implement > > > operations on all provided array types. > > > > > > The protocol itself is both simpler and more powerful than > > > ``__array_function__``, because it doesn't need to worry about > > > actually > > > implementing functions. We believe it resolves most of the > > > maintainability and > > > functionality limitations of ``__array_function__``. > > > > > > The new protocol is opt-in, explicit and with local control; see > > > :ref:`appendix-design-choices` for discussion on the importance > > > of these design > > > features. > > > > > > The array module contract > > > ========================= > > > > > > Modules returned by ``get_array_module``/``__array_module__`` > > > should make a > > > best effort to implement NumPy's core functionality on new array > > > types(s). > > > Unimplemented functionality should simply be omitted (e.g., > > > accessing an > > > unimplemented function should raise ``AttributeError``). In the > > > future, we > > > anticipate codifying a protocol for requesting restricted subsets > > > of ``numpy``; > > > see :ref:`requesting-restricted-subsets` for more details. > > > > > > How to use ``get_array_module`` > > > =============================== > > > > > > Code that wants to support generic duck arrays should explicitly > > > call > > > ``get_array_module`` to determine an appropriate array module > > > from which to > > > call functions, rather than using the ``numpy`` namespace > > > directly. For > > > example: > > > > > > .. code:: python > > > > > > # calls the appropriate version of np.something for x and y > > > module = np.get_array_module(x, y) > > > module.something(x, y) > > > > > > Both array creation and array conversion are supported, because > > > dispatching is > > > handled by ``get_array_module`` rather than via the types of > > > function > > > arguments. For example, to use random number generation functions > > > or methods, > > > we can simply pull out the appropriate submodule: > > > > > > .. code:: python > > > > > > def duckarray_add_random(array): > > > module = np.get_array_module(array) > > > noise = module.random.randn(*array.shape) > > > return array + noise > > > > > > We can also write the duck-array ``stack`` function from `NEP 30 > > > <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_, > > > without the need > > > for a new ``np.duckarray`` function: > > > > > > .. code:: python > > > > > > def duckarray_stack(arrays): > > > module = np.get_array_module(*arrays) > > > arrays = [module.asarray(arr) for arr in arrays] > > > shapes = {arr.shape for arr in arrays} > > > if len(shapes) != 1: > > > raise ValueError('all input arrays must have the same > > > shape') > > > expanded_arrays = [arr[module.newaxis, ...] for arr in > > > arrays] > > > return module.concatenate(expanded_arrays, axis=0) > > > > > > By default, ``get_array_module`` will return the ``numpy`` module > > > if no > > > arguments are arrays. This fall-back can be explicitly controlled > > > by providing > > > the ``module`` keyword-only argument. It is also possible to > > > indicate that an > > > exception should be raised instead of returning a default array > > > module by > > > setting ``module=None``. > > > > > > How to implement ``__array_module__`` > > > ===================================== > > > > > > Libraries implementing a duck array type that want to support > > > ``get_array_module`` need to implement the corresponding > > > protocol, > > > ``__array_module__``. This new protocol is based on Python's > > > dispatch protocol > > > for arithmetic, and is essentially a simpler version of > > > ``__array_function__``. > > > > > > Only one argument is passed into ``__array_module__``, a Python > > > collection of > > > unique array types passed into ``get_array_module``, i.e., all > > > arguments with > > > an ``__array_module__`` attribute. > > > > > > The special method should either return an namespace with an API > > > matching > > > ``numpy``, or ``NotImplemented``, indicating that it does not > > > know how to > > > handle the operation: > > > > > > .. code:: python > > > > > > class MyArray: > > > def __array_module__(self, types): > > > if not all(issubclass(t, MyArray) for t in types): > > > return NotImplemented > > > return my_array_module > > > > > > Returning custom objects from ``__array_module__`` > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > ``my_array_module`` will typically, but need not always, be a > > > Python module. > > > Returning a custom objects (e.g., with functions implemented via > > > ``__getattr__``) may be useful for some advanced use cases. > > > > > > For example, custom objects could allow for partial > > > implementations of duck > > > array modules that fall-back to NumPy (although this is not > > > recommended in > > > general because such fall-back behavior can be error prone): > > > > > > .. code:: python > > > > > > class MyArray: > > > def __array_module__(self, types): > > > if all(issubclass(t, MyArray) for t in types): > > > return ArrayModule() > > > else: > > > return NotImplemented > > > > > > class ArrayModule: > > > def __getattr__(self, name): > > > import base_module > > > return getattr(base_module, name, getattr(numpy, > > > name)) > > > > > > Subclassing from ``numpy.ndarray`` > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > All of the same guidance about well-defined type casting > > > hierarchies from > > > NEP-18 still applies. ``numpy.ndarray`` itself contains a > > > matching > > > implementation of ``__array_module__``, which is convenient for > > > subclasses: > > > > > > .. code:: python > > > > > > class ndarray: > > > def __array_module__(self, types): > > > if all(issubclass(t, ndarray) for t in types): > > > return numpy > > > else: > > > return NotImplemented > > > > > > NumPy's internal machinery > > > ========================== > > > > > > The type resolution rules of ``get_array_module`` follow the same > > > model as > > > Python and NumPy's existing dispatch protocols: subclasses are > > > called before > > > super-classes, and otherwise left to right. ``__array_module__`` > > > is guaranteed > > > to be called only a single time on each unique type. > > > > > > The actual implementation of `get_array_module` will be in C, but > > > should be > > > equivalent to this Python code: > > > > > > .. code:: python > > > > > > def get_array_module(*arrays, default=numpy): > > > implementing_arrays, types = > > > _implementing_arrays_and_types(arrays) > > > if not implementing_arrays and default is not None: > > > return default > > > for array in implementing_arrays: > > > module = array.__array_module__(types) > > > if module is not NotImplemented: > > > return module > > > raise TypeError("no common array module found") > > > > > > def _implementing_arrays_and_types(relevant_arrays): > > > types = [] > > > implementing_arrays = [] > > > for array in relevant_arrays: > > > t = type(array) > > > if t not in types and hasattr(t, '__array_module__'): > > > types.append(t) > > > # Subclasses before superclasses, otherwise left > > > to right > > > index = len(implementing_arrays) > > > for i, old_array in > > > enumerate(implementing_arrays): > > > if issubclass(t, type(old_array)): > > > index = i > > > break > > > implementing_arrays.insert(index, array) > > > return implementing_arrays, types > > > > > > Relationship with ``__array_ufunc__`` and ``__array_function__`` > > > ---------------------------------------------------------------- > > > > > > These older protocols have distinct use-cases and should remain > > > =============================================================== > > > > > > ``__array_module__`` is intended to resolve limitations of > > > ``__array_function__``, so it is natural to consider whether it > > > could entirely > > > replace ``__array_function__``. This would offer dual benefits: > > > (1) simplifying > > > the user-story about how to override NumPy and (2) removing the > > > slowdown > > > associated with checking for dispatch when calling every NumPy > > > function. > > > > > > However, ``__array_module__`` and ``__array_function__`` are > > > pretty different > > > from a user perspective: it requires explicit calls to > > > ``get_array_function``, > > > rather than simply reusing original ``numpy`` functions. This is > > > probably fine > > > for *libraries* that rely on duck-arrays, but may be > > > frustratingly verbose for > > > interactive use. > > > > > > Some of the dispatching use-cases for ``__array_ufunc__`` are > > > also solved by > > > ``__array_module__``, but not all of them. For example, it is > > > still useful to > > > be able to define non-NumPy ufuncs (e.g., from Numba or SciPy) in > > > a generic way > > > on non-NumPy arrays (e.g., with dask.array). > > > > > > Given their existing adoption and distinct use cases, we don't > > > think it makes > > > sense to remove or deprecate ``__array_function__`` and > > > ``__array_ufunc__`` at > > > this time. > > > > > > Mixin classes to implement ``__array_function__`` and > > > ``__array_ufunc__`` > > > ================================================================= > > > ======== > > > > > > Despite the user-facing differences, ``__array_module__`` and a > > > module > > > implementing NumPy's API still contain sufficient functionality > > > needed to > > > implement dispatching with the existing duck array protocols. > > > > > > For example, the following mixin classes would provide sensible > > > defaults for > > > these special methods in terms of ``get_array_module`` and > > > ``__array_module__``: > > > > > > .. code:: python > > > > > > class ArrayUfuncFromModuleMixin: > > > > > > def __array_ufunc__(self, ufunc, method, *inputs, > > > **kwargs): > > > arrays = inputs + kwargs.get('out', ()) > > > try: > > > array_module = np.get_array_module(*arrays) > > > except TypeError: > > > return NotImplemented > > > > > > try: > > > # Note this may have false positive matches, if > > > ufunc.__name__ > > > # matches the name of a ufunc defined by NumPy. > > > Unfortunately > > > # there is no way to determine in which module a > > > ufunc was > > > # defined. > > > new_ufunc = getattr(array_module, ufunc.__name__) > > > except AttributeError: > > > return NotImplemented > > > > > > try: > > > callable = getattr(new_ufunc, method) > > > except AttributeError: > > > return NotImplemented > > > > > > return callable(*inputs, **kwargs) > > > > > > class ArrayFunctionFromModuleMixin: > > > > > > def __array_function__(self, func, types, args, kwargs): > > > array_module = self.__array_module__(types) > > > if array_module is NotImplemented: > > > return NotImplemented > > > > > > # Traverse submodules to find the appropriate > > > function > > > modules = func.__module__.split('.') > > > assert modules[0] == 'numpy' > > > for submodule in modules[1:]: > > > module = getattr(module, submodule, None) > > > new_func = getattr(module, func.__name__, None) > > > if new_func is None: > > > return NotImplemented > > > > > > return new_func(*args, **kwargs) > > > > > > To make it easier to write duck arrays, we could also add these > > > mixin classes > > > into ``numpy.lib.mixins`` (but the examples above may suffice). > > > > > > Alternatives considered > > > ----------------------- > > > > > > Naming > > > ====== > > > > > > We like the name ``__array_module__`` because it mirrors the > > > existing > > > ``__array_function__`` and ``__array_ufunc__`` protocols. Another > > > reasonable > > > choice could be ``__array_namespace__``. > > > > > > It is less clear what the NumPy function that calls this protocol > > > should be > > > called (``get_array_module`` in this proposal). Some possible > > > alternatives: > > > ``array_module``, ``common_array_module``, > > > ``resolve_array_module``, > > > ``get_namespace``, ``get_numpy``, ``get_numpylike_module``, > > > ``get_duck_array_module``. > > > > > > .. _requesting-restricted-subsets: > > > > > > Requesting restricted subsets of NumPy's API > > > ============================================ > > > > > > Over time, NumPy has accumulated a very large API surface, with > > > over 600 > > > attributes in the top level ``numpy`` module alone. It is > > > unlikely that any > > > duck array library could or would want to implement all of these > > > functions and > > > classes, because the frequently used subset of NumPy is much > > > smaller. > > > > > > We think it would be useful exercise to define "minimal" > > > subset(s) of NumPy's > > > API, omitting rarely used or non-recommended functionality. For > > > example, > > > minimal NumPy might include ``stack``, but not the other stacking > > > functions > > > ``column_stack``, ``dstack``, ``hstack`` and ``vstack``. This > > > could clearly > > > indicate to duck array authors and users want functionality is > > > core and what > > > functionality they can skip. > > > > > > Support for requesting a restricted subset of NumPy's API would > > > be a natural > > > feature to include in ``get_array_function`` and > > > ``__array_module__``, e.g., > > > > > > .. code:: python > > > > > > # array_module is only guaranteed to contain "minimal" NumPy > > > array_module = np.get_array_module(*arrays, > > > request='minimal') > > > > > > To facilitate testing with NumPy and use with any valid duck > > > array library, > > > NumPy itself would return restricted versions of the ``numpy`` > > > module when > > > ``get_array_module`` is called only on NumPy arrays. Omitted > > > functions would > > > simply not exist. > > > > > > Unfortunately, we have not yet figured out what these restricted > > > subsets should > > > be, so it doesn't make sense to do this yet. When/if we do, we > > > could either add > > > new keyword arguments to ``get_array_module`` or add new top > > > level functions, > > > e.g., ``get_minimal_array_module``. We would also need to add > > > either a new > > > protocol patterned off of ``__array_module__`` (e.g., > > > ``__array_module_minimal__``), or could add an optional second > > > argument to > > > ``__array_module__`` (catching errors with ``try``/``except``). > > > > > > A new namespace for implicit dispatch > > > ===================================== > > > > > > Instead of supporting overrides in the main `numpy` namespace > > > with > > > ``__array_function__``, we could create a new opt-in namespace, > > > e.g., > > > ``numpy.api``, with versions of NumPy functions that support > > > dispatching. These > > > overrides would need new opt-in protocols, e.g., > > > ``__array_function_api__`` > > > patterned off of ``__array_function__``. > > > > > > This would resolve the biggest limitations of > > > ``__array_function__`` by being > > > opt-in and would also allow for unambiguously overriding > > > functions like > > > ``asarray``, because ``np.api.asarray`` would always mean > > > "convert an > > > array-like object." But it wouldn't solve all the dispatching > > > needs met by > > > ``__array_module__``, and would leave us with supporting a > > > considerably more > > > complex protocol both for array users and implementors. > > > > > > We could potentially implement such a new namespace *via* the > > > ``__array_module__`` protocol. Certainly some users would find > > > this convenient, > > > because it is slightly less boilerplate. But this would leave > > > users with a > > > confusing choice: when should they use `get_array_module` vs. > > > `np.api.something`. Also, we would have to add and maintain a > > > whole new module, > > > which is considerably more expensive than merely adding a > > > function. > > > > > > Dispatching on both types and arrays instead of only types > > > ========================================================== > > > > > > Instead of supporting dispatch only via unique array types, we > > > could also > > > support dispatch via array objects, e.g., by passing an > > > ``arrays`` argument as > > > part of the ``__array_module__`` protocol. This could potentially > > > be useful for > > > dispatch for arrays with metadata, such provided by Dask and > > > Pint, but would > > > impose costs in terms of type safety and complexity. > > > > > > For example, a library that supports arrays on both CPUs and GPUs > > > might decide > > > on which device to create a new arrays from functions like > > > ``ones`` based on > > > input arguments: > > > > > > .. code:: python > > > > > > class Array: > > > def __array_module__(self, types, arrays): > > > useful_arrays = tuple(a in arrays if isinstance(a, > > > Array)) > > > if not useful_arrays: > > > return NotImplemented > > > prefer_gpu = any(a.prefer_gpu for a in useful_arrays) > > > return ArrayModule(prefer_gpu) > > > > > > class ArrayModule: > > > def __init__(self, prefer_gpu): > > > self.prefer_gpu = prefer_gpu > > > > > > def __getattr__(self, name): > > > import base_module > > > base_func = getattr(base_module, name) > > > return functools.partial(base_func, > > > prefer_gpu=self.prefer_gpu) > > > > > > This might be useful, but it's not clear if we really need it. > > > Pint seems to > > > get along OK without any explicit array creation routines > > > (favoring > > > multiplication by units, e.g., ``np.ones(5) * ureg.m``), and for > > > the most part > > > Dask is also OK with existing ``__array_function__`` style > > > overides (e.g., > > > favoring ``np.ones_like`` over ``np.ones``). Choosing whether to > > > place an array > > > on the CPU or GPU could be solved by `making array creation lazy > > > <https://github.com/google/jax/pull/1668>`_. > > > > > > .. _appendix-design-choices: > > > > > > Appendix: design choices for API overrides > > > ------------------------------------------ > > > > > > There is a large range of possible design choices for overriding > > > NumPy's API. > > > Here we discuss three major axes of the design decision that > > > guided our design > > > for ``__array_module__``. > > > > > > Opt-in vs. opt-out for users > > > ============================ > > > > > > The ``__array_ufunc__`` and ``__array_function__`` protocols > > > provide a > > > mechanism for overriding NumPy functions *within NumPy's existing > > > namespace*. > > > This means that users need to explicitly opt-out if they do not > > > want any > > > overridden behavior, e.g., by casting arrays with > > > ``np.asarray()``. > > > > > > In theory, this approach lowers the barrier for adopting these > > > protocols in > > > user code and libraries, because code that uses the standard > > > NumPy namespace is > > > automatically compatible. But in practice, this hasn't worked > > > out. For example, > > > most well-maintained libraries that use NumPy follow the best > > > practice of > > > casting all inputs with ``np.asarray()``, which they would have > > > to explicitly > > > relax to use ``__array_function__``. Our experience has been that > > > making a > > > library compatible with a new duck array type typically requires > > > at least a > > > small amount of work to accommodate differences in the data model > > > and operations > > > that can be implemented efficiently. > > > > > > These opt-out approaches also considerably complicate backwards > > > compatibility > > > for libraries that adopt these protocols, because by opting in as > > > a library > > > they also opt-in their users, whether they expect it or not. For > > > winning over > > > libraries that have been unable to adopt ``__array_function__``, > > > an opt-in > > > approach seems like a must. > > > > > > Explicit vs. implicit choice of implementation > > > ============================================== > > > > > > Both ``__array_ufunc__`` and ``__array_function__`` have implicit > > > control over > > > dispatching: the dispatched functions are determined via the > > > appropriate > > > protocols in every function call. This generalizes well to > > > handling many > > > different types of objects, as evidenced by its use for > > > implementing arithmetic > > > operators in Python, but it has two downsides: > > > > > > 1. *Speed*: it imposes additional overhead in every function > > > call, because each > > > function call needs to inspect each of its arguments for > > > overrides. This is > > > why arithmetic on builtin Python numbers is slow. > > > 2. *Readability*: it is not longer immediately evident to readers > > > of code what > > > happens when a function is called, because the function's > > > implementation > > > could be overridden by any of its arguments. > > > > > > In contrast, importing a new library (e.g., ``import dask.array > > > as da``) with > > > an API matching NumPy is entirely explicit. There is no overhead > > > from dispatch > > > or ambiguity about which implementation is being used. > > > > > > Explicit and implicit choice of implementations are not mutually > > > exclusive > > > options. Indeed, most implementations of NumPy API overrides via > > > ``__array_function__`` that we are familiar with (namely, dask, > > > CuPy and > > > sparse, but not Pint) also include an explicit way to use their > > > version of > > > NumPy's API by importing a module directly (``dask.array``, > > > ``cupy`` or > > > ``sparse``, respectively). > > > > > > Local vs. non-local vs. global control > > > ====================================== > > > > > > The final design axis is how users control the choice of API: > > > > > > - **Local control**, as exemplified by multiple dispatch and > > > Python protocols for > > > arithmetic, determines which implementation to use either by > > > checking types > > > or calling methods on the direct arguments of a function. > > > - **Non-local control** such as `np.errstate > > > < > > > https://docs.scipy.org/doc/numpy/reference/generated/numpy.errstate.html>`_ > > > overrides behavior with global-state via function decorators or > > > context-managers. Control is determined hierarchically, via the > > > inner-most > > > context. > > > - **Global control** provides a mechanism for users to set > > > default behavior, > > > either via function calls or configuration files. For example, > > > matplotlib > > > allows setting a global choice of plotting backend. > > > > > > Local control is generally considered a best practice for API > > > design, because > > > control flow is entirely explicit, which makes it the easiest to > > > understand. > > > Non-local and global control are occasionally used, but generally > > > either due to > > > ignorance or a lack of better alternatives. > > > > > > In the case of duck typing for NumPy's public API, we think non- > > > local or global > > > control would be mistakes, mostly because they **don't compose > > > well**. If one > > > library sets/needs one set of overrides and then internally calls > > > a routine > > > that expects another set of overrides, the resulting behavior may > > > be very > > > surprising. Higher order functions are especially problematic, > > > because the > > > context in which functions are evaluated may not be the context > > > in which they > > > are defined. > > > > > > One class of override use cases where we think non-local and > > > global control are > > > appropriate is for choosing a backend system that is guaranteed > > > to have an > > > entirely consistent interface, such as a faster alternative > > > implementation of > > > ``numpy.fft`` on NumPy arrays. However, these are out of scope > > > for the current > > > proposal, which is focused on duck arrays. > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion@python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion