Hi Hameer, I'm confused: Isn't your reference array just `self`? All the best,
Marten On Wed, Jun 27, 2018 at 2:27 AM, Hameer Abbasi <einstein.edi...@gmail.com> wrote: > > > On 27. Jun 2018 at 07:48, Stephan Hoyer <sho...@gmail.com> wrote: > > > After much discussion (and the addition of three new co-authors!), I’m > pleased to present a significantly revision of NumPy Enhancement Proposal > 18: A dispatch mechanism for NumPy's high level array functions: > http://www.numpy.org/neps/nep-0018-array-function-protocol.html > > The full text is also included below. > > Best, > Stephan > > =========================================================== > A dispatch mechanism for NumPy's high level array functions > =========================================================== > > :Author: Stephan Hoyer <sho...@google.com> > :Author: Matthew Rocklin <mrock...@gmail.com> > :Author: Marten van Kerkwijk <m...@astro.utoronto.ca> > :Author: Hameer Abbasi <hameerabb...@yahoo.com> > :Author: Eric Wieser <wieser.e...@gmail.com> > :Status: Draft > :Type: Standards Track > :Created: 2018-05-29 > > Abstact > ------- > > We propose the ``__array_function__`` protocol, to allow arguments of NumPy > functions to define how that function operates on them. This will allow > using NumPy as a high level API for efficient multi-dimensional array > operations, even with array implementations that differ greatly from > ``numpy.ndarray``. > > Detailed description > -------------------- > > NumPy's high level ndarray API has been implemented several times > outside of NumPy itself for different architectures, such as for GPU > arrays (CuPy), Sparse arrays (scipy.sparse, pydata/sparse) and parallel > arrays (Dask array) as well as various NumPy-like implementations in the > deep learning frameworks, like TensorFlow and PyTorch. > > Similarly there are many projects that build on top of the NumPy API > for labeled and indexed arrays (XArray), automatic differentiation > (Autograd, Tangent), masked arrays (numpy.ma), physical units > (astropy.units, > pint, unyt), etc. that add additional functionality on top of the NumPy > API. > Most of these project also implement a close variation of NumPy's level > high > API. > > We would like to be able to use these libraries together, for example we > would like to be able to place a CuPy array within XArray, or perform > automatic differentiation on Dask array code. This would be easier to > accomplish if code written for NumPy ndarrays could also be used by > other NumPy-like projects. > > For example, we would like for the following code example to work > equally well with any NumPy-like array object: > > .. code:: python > > def f(x): > y = np.tensordot(x, x.T) > return np.mean(np.exp(y)) > > Some of this is possible today with various protocol mechanisms within > NumPy. > > - The ``np.exp`` function checks the ``__array_ufunc__`` protocol > - The ``.T`` method works using Python's method dispatch > - The ``np.mean`` function explicitly checks for a ``.mean`` method on > the argument > > However other functions, like ``np.tensordot`` do not dispatch, and > instead are likely to coerce to a NumPy array (using the ``__array__``) > protocol, or err outright. To achieve enough coverage of the NumPy API > to support downstream projects like XArray and autograd we want to > support *almost all* functions within NumPy, which calls for a more > reaching protocol than just ``__array_ufunc__``. We would like a > protocol that allows arguments of a NumPy function to take control and > divert execution to another function (for example a GPU or parallel > implementation) in a way that is safe and consistent across projects. > > Implementation > -------------- > > We propose adding support for a new protocol in NumPy, > ``__array_function__``. > > This protocol is intended to be a catch-all for NumPy functionality that > is not covered by the ``__array_ufunc__`` protocol for universal functions > (like ``np.exp``). The semantics are very similar to ``__array_ufunc__``, > except > the operation is specified by an arbitrary callable object rather than a > ufunc > instance and method. > > A prototype implementation can be found in > `this notebook <https://nbviewer.jupyter.org/gist/shoyer/ > 1f0a308a06cd96df20879a1ddb8f0006>`_. > > The interface > ~~~~~~~~~~~~~ > > We propose the following signature for implementations of > ``__array_function__``: > > .. code-block:: python > > def __array_function__(self, func, types, args, kwargs) > > - ``func`` is an arbitrary callable exposed by NumPy's public API, > which was called in the form ``func(*args, **kwargs)``. > - ``types`` is a ``frozenset`` of unique argument types from the original > NumPy > function call that implement ``__array_function__``. > - The tuple ``args`` and dict ``kwargs`` are directly passed on from the > original call. > > Unlike ``__array_ufunc__``, there are no high-level guarantees about the > type of ``func``, or about which of ``args`` and ``kwargs`` may contain > objects > implementing the array API. > > As a convenience for ``__array_function__`` implementors, ``types`` > provides all > argument types with an ``'__array_function__'`` attribute. This > allows downstream implementations to quickly determine if they are likely > able > to support the operation. A ``frozenset`` is used to ensure that > ``__array_function__`` implementations cannot rely on the iteration order > of > ``types``, which would facilitate violating the well-defined "Type casting > hierarchy" described in > `NEP-13 <https://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_. > > Example for a project implementing the NumPy API > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Most implementations of ``__array_function__`` will start with two > checks: > > 1. Is the given function something that we know how to overload? > 2. Are all arguments of a type that we know how to handle? > > If these conditions hold, ``__array_function__`` should return > the result from calling its implementation for ``func(*args, **kwargs)``. > Otherwise, it should return the sentinel value ``NotImplemented``, > indicating > that the function is not implemented by these types. This is preferable to > raising ``TypeError`` directly, because it gives *other* arguments the > opportunity to define the operations. > > There are no general requirements on the return value from > ``__array_function__``, although most sensible implementations should > probably > return array(s) with the same type as one of the function's arguments. > If/when Python gains > `typing support for protocols <https://www.python.org/dev/peps/pep-0544/ > >`_ > and NumPy adds static type annotations, the ``@overload`` implementation > for ``SupportsArrayFunction`` will indicate a return type of ``Any``. > > It may also be convenient to define a custom decorators (``implements`` > below) > for registering ``__array_function__`` implementations. > > .. code:: python > > HANDLED_FUNCTIONS = {} > > class MyArray: > def __array_function__(self, func, types, args, kwargs): > if func not in HANDLED_FUNCTIONS: > return NotImplemented > # Note: this allows subclasses that don't override > # __array_function__ to handle MyArray objects > if not all(issubclass(t, MyArray) for t in types): > return NotImplemented > return HANDLED_FUNCTIONS[func](*args, **kwargs) > > def implements(numpy_function): > """Register an __array_function__ implementation for MyArray > objects.""" > def decorator(func): > HANDLED_FUNCTIONS[numpy_function] = func > return func > return decorator > > @implements(np.concatenate) > def concatenate(arrays, axis=0, out=None): > ... # implementation of concatenate for MyArray objects > > @implements(np.broadcast_to) > def broadcast_to(array, shape): > ... # implementation of broadcast_to for MyArray objects > > Note that it is not required for ``__array_function__`` implementations to > include *all* of the corresponding NumPy function's optional arguments > (e.g., ``broadcast_to`` above omits the irrelevant ``subok`` argument). > Optional arguments are only passed in to ``__array_function__`` if they > were explicitly used in the NumPy function call. > > Necessary changes within the NumPy codebase itself > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > This will require two changes within the NumPy codebase: > > 1. A function to inspect available inputs, look for the > ``__array_function__`` attribute on those inputs, and call those > methods appropriately until one succeeds. This needs to be fast in the > common all-NumPy case, and have acceptable performance (no worse than > linear time) even if the number of overloaded inputs is large (e.g., > as might be the case for `np.concatenate`). > > This is one additional function of moderate complexity. > 2. Calling this function within all relevant NumPy functions. > > This affects many parts of the NumPy codebase, although with very low > complexity. > > Finding and calling the right ``__array_function__`` > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Given a NumPy function, ``*args`` and ``**kwargs`` inputs, we need to > search through ``*args`` and ``**kwargs`` for all appropriate inputs > that might have the ``__array_function__`` attribute. Then we need to > select among those possible methods and execute the right one. > Negotiating between several possible implementations can be complex. > > Finding arguments > ''''''''''''''''' > > Valid arguments may be directly in the ``*args`` and ``**kwargs``, such > as in the case for ``np.tensordot(left, right, out=out)``, or they may > be nested within lists or dictionaries, such as in the case of > ``np.concatenate([x, y, z])``. This can be problematic for two reasons: > > 1. Some functions are given long lists of values, and traversing them > might be prohibitively expensive. > 2. Some functions may have arguments that we don't want to inspect, even > if they have the ``__array_function__`` method. > > To resolve these issues, NumPy functions should explicitly indicate which > of their arguments may be overloaded, and how these arguments should be > checked. As a rule, this should include all arguments documented as either > ``array_like`` or ``ndarray``. > > We propose to do so by writing "dispatcher" functions for each overloaded > NumPy function: > > - These functions will be called with the exact same arguments that were > passed > into the NumPy function (i.e., ``dispatcher(*args, **kwargs)``), and > should > return an iterable of arguments to check for overrides. > - Dispatcher functions are required to share the exact same positional, > optional and keyword-only arguments as their corresponding NumPy > functions. > Otherwise, valid invocations of a NumPy function could result in an > error when > calling its dispatcher. > - Because default *values* for keyword arguments do not have > ``__array_function__`` attributes, by convention we set all default > argument > values to ``None``. This reduces the likelihood of signatures falling out > of sync, and minimizes extraneous information in the dispatcher. > The only exception should be cases where the argument value in some way > effects dispatching, which should be rare. > > An example of the dispatcher for ``np.concatenate`` may be instructive: > > .. code:: python > > def _concatenate_dispatcher(arrays, axis=None, out=None): > for array in arrays: > yield array > if out is not None: > yield out > > The concatenate dispatcher is written as generator function, which allows > it > to potentially include the value of the optional ``out`` argument without > needing to create a new sequence with the (potentially long) list of > objects > to be concatenated. > > Trying ``__array_function__`` methods until the right one works > ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' > > Many arguments may implement the ``__array_function__`` protocol. Some > of these may decide that, given the available inputs, they are unable to > determine the correct result. How do we call the right one? If several > are valid then which has precedence? > > For the most part, the rules for dispatch with ``__array_function__`` > match those for ``__array_ufunc__`` (see > `NEP-13 <https://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_). > In particular: > > - NumPy will gather implementations of ``__array_function__`` from all > specified inputs and call them in order: subclasses before > superclasses, and otherwise left to right. Note that in some edge cases > involving subclasses, this differs slightly from the > `current behavior <https://bugs.python.org/issue30140>`_ of Python. > - Implementations of ``__array_function__`` indicate that they can > handle the operation by returning any value other than > ``NotImplemented``. > - If all ``__array_function__`` methods return ``NotImplemented``, > NumPy will raise ``TypeError``. > > One deviation from the current behavior of ``__array_ufunc__`` is that > NumPy > will only call ``__array_function__`` on the *first* argument of each > unique > type. This matches Python's > `rule for calling reflected methods <https://docs.python.org/3/ > reference/datamodel.html#object.__ror__>`_, > and this ensures that checking overloads has acceptable performance even > when > there are a large number of overloaded arguments. To avoid long-term > divergence > between these two dispatch protocols, we should > `also update <https://github.com/numpy/numpy/issues/11306>`_ > ``__array_ufunc__`` to match this behavior. > > Special handling of ``numpy.ndarray`` > ''''''''''''''''''''''''''''''''''''' > > The use cases for subclasses with ``__array_function__`` are the same as > those > with ``__array_ufunc__``, so ``numpy.ndarray`` should also define a > ``__array_function__`` method mirroring ``ndarray.__array_ufunc__``: > > .. code:: python > > def __array_function__(self, func, types, args, kwargs): > # Cannot handle items that have __array_function__ other than our > own. > for t in types: > if (hasattr(t, '__array_function__') and > t.__array_function__ is not > ndarray.__array_function__): > return NotImplemented > > # Arguments contain no overrides, so we can safely call the > # overloaded function again. > return func(*args, **kwargs) > > To avoid infinite recursion, the dispatch rules for ``__array_function__`` > need > also the same special case they have for ``__array_ufunc__``: any > arguments with > an ``__array_function__`` method that is identical to > ``numpy.ndarray.__array_function__`` are not be called as > ``__array_function__`` implementations. > > Changes within NumPy functions > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Given a function defining the above behavior, for now call it > ``try_array_function_override``, we now need to call that function from > within every relevant NumPy function. This is a pervasive change, but of > fairly simple and innocuous code that should complete quickly and > without effect if no arguments implement the ``__array_function__`` > protocol. > > In most cases, these functions should written using the > ``array_function_dispatch`` decorator, which also associates dispatcher > functions: > > .. code:: python > > def array_function_dispatch(dispatcher): > """Wrap a function for dispatch with the __array_function__ > protocol.""" > def decorator(func): > @functools.wraps(func) > def new_func(*args, **kwargs): > relevant_arguments = dispatcher(*args, **kwargs) > success, value = try_array_function_override( > new_func, relevant_arguments, args, kwargs) > if success: > return value > return func(*args, **kwargs) > return new_func > return decorator > > # example usage > def _broadcast_to_dispatcher(array, shape, subok=None, > **ignored_kwargs): > return (array,) > > @array_function_dispatch(_broadcast_to_dispatcher) > def broadcast_to(array, shape, subok=False): > ... # existing definition of np.broadcast_to > > Using a decorator is great! We don't need to change the definitions of > existing NumPy functions, and only need to write a few additional lines > for the dispatcher function. We could even reuse a single dispatcher for > families of functions with the same signature (e.g., ``sum`` and ``prod``). > For such functions, the largest change could be adding a few lines to the > docstring to note which arguments are checked for overloads. > > It's particularly worth calling out the decorator's use of > ``functools.wraps``: > > - This ensures that the wrapped function has the same name and docstring as > the wrapped NumPy function. > - On Python 3, it also ensures that the decorator function copies the > original > function signature, which is important for introspection based tools > such as > auto-complete. If we care about preserving function signatures on Python > 2, > for the `short while longer <http://www.numpy.org/neps/ > nep-0014-dropping-python2.7-proposal.html>`_ > that NumPy supports Python 2.7, we do could do so by adding a vendored > dependency on the (single-file, BSD licensed) > `decorator library <https://github.com/micheles/decorator>`_. > - Finally, it ensures that the wrapped function > `can be pickled <http://gael-varoquaux.info/programming/decoration-in- > python-done-right-decorating-and-pickling.html>`_. > > In a few cases, it would not make sense to use the > ``array_function_dispatch`` > decorator directly, but override implementation in terms of > ``try_array_function_override`` should still be straightforward. > > - Functions written entirely in C (e.g., ``np.concatenate``) can't use > decorators, but they could still use a C equivalent of > ``try_array_function_override``. If performance is not a concern, they > could > also be easily wrapped with a small Python wrapper. > - The ``__call__`` method of ``np.vectorize`` can't be decorated with > <p style="margin:0px;font-stretch:normal;font-size:17.4px;line- > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > I would like to propose that we use `__array_function` in the following > manner for functions that create arrays: > > - `array_reference` for indicating the “reference array” whose > `__array_function__` implementation will be called. For example, > `np.arange(5, array_reference=some_dask_array)`. > - I use a reference in the design rather than a type because for some > arrays (such as Dask), chunk sizes or other reference data is needed to > make this work. > > > I realise that this is a big design decision, so I welcome any input! > > Best Regards, > Hameer Abbasi > Sent from Astro <https://www.helloastro.com> for Mac > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion