Re: [Numpy-discussion] NEP: array API standard adoption (NEP 47)

Sebastian Berg Mon, 22 Feb 2021 11:52:29 -0800

On Mon, 2021-02-22 at 20:16 +0100, Ralf Gommers wrote:

<snip>


> > 
> > It seems the current idea is to create a new NumPy array subclass.
> > That
> > sounds good, but I am a bit worried how that is going to interact
> > with
> > actual NumPy arrays.
> > 
> 
> Not a subclass! If you got that impression, I should clarify the
> text. The

Sorry, you do write "wraps a numpy.ndarray", I am not sure why I got
the subclass idea when reading it yesterday.

But, in that case maybe you should just implement it as:

    def somefunc(x):
        # do whatever checks needed here for, e.g., input validation
        # then call the native numpy implementation:
        x = as_numpy_array(x)
        result = np.somefunc(x)
        return as_minimal(x)

and not even use the `._implementation`?  I guess small issue is that
we don't have a way to call `as_ndarray` on all relevant inputs
conveniently.  In most cases it will be straight forward though.

I assume you need your own `as_numpy_array` call, to reject ndarray-
subclasses, that `_implementation` will allow to pass through.


> idea is a standalone class that doesn't inherit from anything, and
> has only
> the methods, attributes and semantics described in the API standard.
> It
> just uses np.asarray under the hood. This will be clearer when we
> have a
> prototype for it, probably within two weeks.
> 

Then I should wait for the prototype, for more discussion :).

> Most SciPy users, will still use NumPy proper.  So SciPy must not
> > return this "random" subclass, when the user did pass in a NumPy
> > array
> > (or even by default).
> > 
> 
> Agreed. ndarray in = ndarray out; new array object in = new array
> object
> out.
> 
> 
> > At that point, what will the SciPy dev do to juggle the fact that
> > you
> > would like to use the new API internally, but the interface must
> > still
> > default to NumPy (and may depend on the input)?
> > This is probably not very tricky, but I am slightly worried about
> > what
> > happens when things get mixed up.  Also if a user passes this
> > "minimal
> > array" into a current NumPy API function, it will often not work if
> > it
> > is a subclass.
> > 
> > ---
> > 
> > Related to that: how important is it to keep that namespace a
> > "minimal"
> > implementation, rather than a "conforming" one?  For example, would
> > you
> > want to reject `numpy_api.array([1, 2, 3, 4], dtype="i,i")`? Or
> > just
> > `dtype="complex128"`?
> > 
> 
> Yes, I'd definitely want to reject that. Format strings are terrible.


Agreed, I guess I am wondering whether we can find a good solution that
does not involve writing stubs around 140 functions with more strict
input validation.
But maybe it is also not particularly difficult or churn to do... Or
even automate, e.g. from the typing stubs.


> 
> Maybe I got the wrong impression though.  Is the aim for a minimal
> > implementation, but you are OK as long as it is a conforming one?
> > 
> 
> In principle we're okay with a conforming one that's a superset of
> what is
> in the standard. But I think we'd only want to do that if creating a
> minimal one turns out to be difficult. Having the minimal required
> set is
> much nicer when one wants to write portable code. Because then you
> can do
> so without checking the docs whether any object/method is in
> "minimal" or
> in "extended".
> 

Right, you would like to have a minimal implementation somewhere.
Having it in NumPy could be convenient, although not strictly
necessary.


<snip>

> 
> > Besides `_implementation` usually does support array subclasses and
> > may
> > still dispatch again internally at this time!
> > 
> 
> What, it dispatches again? That seems very suboptimal. If there's no
> clean
> way to avoid a dispatch, it may make sense to just check array inputs
> for
> the presence of __array_function__ and raise an exception if it's
> present.
> 


I do not think our functions where ever rewritten to only use e.g.
`._implementation()` internally. I am not even quite certain that would
be correct for subclasses.

It is annoying that you may have to struggle with it here to do
something that is different from the implicit dispatchers.  But on the
up-side a clear solution would be helpful in any case.

Cheers,

Sebastian


> It's not just about overhead (that's a minor thing), it's that the
> feature
> does not make much sense in combination with the array_api namespace.
> The
> "get a hold of a new namespace" approach is like __array_module__,
> which
> was an alternative to __array_function__ not an addition to it.
> 
> 
> 
> > ---
> > 
> > I am somewhat worried that getting the promotion (and other quirks)
> > to
> > where you want could be very tricky, unless we are patient enough
> > to
> > wait for NumPy proper to evolve.
> > Hopefully I am just too pessimistic and e.g. a mild form of code
> > duplication can solve all of that.  Probably time and trial-and-
> > error
> > will be the judge on that...
> > 
> 
> I do agree that the different casting rules are the single most
> tricky
> issue implementation wise.
> 
> Cheers,
> Ralf
> 
> 
> 
> > 
> > Cheers,
> > 
> > Sebastian
> > 
> > 
> > > 
> > > See
> > > 
> >     
> > https://mail.python.org/pipermail/numpy-discussion/2020-November/081181.html
> > > for an initial discussion about this topic.
> > > 
> > > Please keep high-level discussion here and detailed comments on
> > > https://github.com/numpy/numpy/pull/18456. Also, you can access a
> > > rendered
> > > version of the NEP from that PR (see PR description for how),
> > > which
> > > may be
> > > helpful.
> > > Cheers,
> > > Ralf
> > > 
> > > 
> > > Abstract
> > > --------
> > > 
> > > We propose to adopt the `Python array API standard`_, developed
> > > by
> > > the
> > > `Consortium for Python Data API Standards`_. Implementing this as
> > > a
> > > separate
> > > new namespace in NumPy will allow authors of libraries which
> > > depend
> > > on NumPy
> > > as well as end users to write code that is portable between NumPy
> > > and
> > > all
> > > other array/tensor libraries that adopt this standard.
> > > 
> > > .. note::
> > > 
> > >     We expect that this NEP will remain in a draft state for
> > > quite a
> > > while.
> > >     Given the large scope we don't expect to propose it for
> > > acceptance any
> > >     time soon; instead, we want to solicit feedback on both the
> > > high-
> > > level
> > >     design and implementation, and learn what needs describing
> > > better
> > > in
> > > this
> > >     NEP or changing in either the implementation or the array API
> > > standard
> > >     itself.
> > > 
> > > 
> > > Motivation and Scope
> > > --------------------
> > > 
> > > Python users have a wealth of choice for libraries and frameworks
> > > for
> > > numerical computing, data science, machine learning, and deep
> > > learning. New
> > > frameworks pushing forward the state of the art in these fields
> > > are
> > > appearing
> > > every year. One unintended consequence of all this activity and
> > > creativity
> > > has been fragmentation in multidimensional array (a.k.a. tensor)
> > > libraries -
> > > which are the fundamental data structure for these fields.
> > > Choices
> > > include
> > > NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, and others.
> > > 
> > > The APIs of each of these libraries are largely similar, but with
> > > enough
> > > differences that it’s quite difficult to write code that works
> > > with
> > > multiple
> > > (or all) of these libraries. The array API standard aims to
> > > address
> > > that
> > > issue, by specifying an API for the most common ways arrays are
> > > constructed
> > > and used. The proposed API is quite similar to NumPy's API, and
> > > deviates
> > > mainly
> > > in places where (a) NumPy made design choices that are inherently
> > > not
> > > portable
> > > to other implementations, and (b) where other libraries
> > > consistently
> > > deviated
> > > from NumPy on purpose because NumPy's design turned out to have
> > > issues or
> > > unnecessary complexity.
> > > 
> > > For a longer discussion on the purpose of the array API standard
> > > we
> > > refer to
> > > the `Purpose and Scope section of the array API standard <
> > >     
> > > https://data-apis.github.io/array-api/latest/purpose_and_scope.html
> > > >`
> > > __
> > > and the two blog posts announcing the formation of the Consortium
> > > [1]_ and
> > > the release of the first draft version of the standard for
> > > community
> > > review
> > > [2]_.
> > > 
> > > The scope of this NEP includes:
> > > 
> > > - Adopting the 2021 version of the array API standard
> > > - Adding a separate namespace, tentatively named
> > > ``numpy.array_api``
> > > - Changes needed/desired outside of the new namespace, for
> > > example
> > > new
> > > dunder
> > >   methods on the ``ndarray`` object
> > > - Implementation choices, and differences between functions in
> > > the
> > > new
> > >   namespace with those in the main ``numpy`` namespace
> > > - A new array object conforming to the array API standard
> > > - Maintenance effort and testing strategy
> > > - Impact on NumPy's total exposed API surface and on other future
> > > and
> > >   under-discussion design choices
> > > - Relation to existing and proposed NumPy array protocols
> > >   (``__array_ufunc__``, ``__array_function__``,
> > > ``__array_module__``).
> > > - Required improvements to existing NumPy functionality
> > > 
> > > Out of scope for this NEP are:
> > > 
> > > - Changes in the array API standard itself. Those are likely to
> > > come
> > > up
> > >   during review of this NEP, but should be upstreamed as needed
> > > and
> > > this NEP
> > >   subsequently updated.
> > > 
> > > 
> > > Usage and Impact
> > > ----------------
> > > 
> > > *This section will be fleshed out later, for now we refer to the
> > > use
> > > cases
> > > given
> > > in* `the array API standard Use Cases section <
> > > https://data-apis.github.io/array-api/latest/use_cases.html>`__
> > > 
> > > In addition to those use cases, the new namespace contains
> > > functionality
> > > that
> > > is widely used and supported by many array libraries. As such, it
> > > is
> > > a good
> > > set of functions to teach to newcomers to NumPy and recommend as
> > > "best
> > > practice". That contrasts with NumPy's main namespace, which
> > > contains
> > > many
> > > functions and objects that have been superceded or we consider
> > > mistakes -
> > > but
> > > that we can't remove because of backwards compatibility reasons.
> > > 
> > > The usage of the ``numpy.array_api`` namespace by downstream
> > > libraries is
> > > intended to enable them to consume multiple kinds of arrays,
> > > *without
> > > having
> > > to have a hard dependency on all of those array libraries*:
> > > 
> > > .. image:: _static/nep-0047-library-dependencies.png
> > > 
> > > Adoption in downstream libraries
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > The prototype implementation of the ``array_api`` namespace will
> > > be
> > > used
> > > with
> > > SciPy, scikit-learn and other libraries of interest that depend
> > > on
> > > NumPy, in
> > > order to get more experience with the design and find out if any
> > > important
> > > parts are missing.
> > > 
> > > The pattern to support multiple array libraries is intended to be
> > > something
> > > like::
> > > 
> > >     def somefunc(x, y):
> > >         # Retrieves standard namespace. Raises if x and y have
> > > different
> > >         # namespaces.  See Appendix for possible get_namespace
> > > implementation
> > >         xp = get_namespace(x, y)
> > >         out = xp.mean(x, axis=0) + 2*xp.std(y, axis=0)
> > >         return out
> > > 
> > > The ``get_namespace`` call is effectively the library author
> > > opting
> > > in to
> > > using the standard API namespace, and thereby explicitly
> > > supporting
> > > all conforming array libraries.
> > > 
> > > 
> > > The ``asarray`` / ``asanyarray`` pattern
> > > ````````````````````````````````````````
> > > 
> > > Many existing libraries use the same ``asarray`` (or
> > > ``asanyarray``)
> > > pattern
> > > as NumPy itself does; accepting any object that can be coerced
> > > into a
> > > ``np.ndarray``.
> > > We consider this design pattern problematic - keeping in mind the
> > > Zen
> > > of
> > > Python, *"explicit is better than implicit"*, as well as the
> > > pattern
> > > being
> > > historically problematic in the SciPy ecosystem for ``ndarray``
> > > subclasses
> > > and with over-eager object creation. All other array/tensor
> > > libraries
> > > are
> > > more strict, and that works out fine in practice. We would advise
> > > authors of
> > > new libraries to avoid the ``asarray`` pattern. Instead they
> > > should
> > > either
> > > accept just NumPy arrays or, if they want to support multiple
> > > kinds
> > > of
> > > arrays, check if the incoming array object supports the array API
> > > standard
> > > by checking for ``__array_namespace__`` as shown in the example
> > > above.
> > > 
> > > Existing libraries can do such a check as well, and only call
> > > ``asarray`` if
> > > the check fails. This is very similar to the ``__duckarray__``
> > > idea
> > > in
> > > :ref:`NEP30`.
> > > 
> > > 
> > > .. _adoption-application-code:
> > > 
> > > Adoption in application code
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > The new namespace can be seen by end users as a cleaned up and
> > > slimmed down
> > > version of NumPy's main namespace. Encouraging end users to use
> > > this
> > > namespace like::
> > > 
> > >     import numpy.array_api as xp
> > > 
> > >     x = xp.linspace(0, 2*xp.pi, num=100)
> > >     y = xp.cos(x)
> > > 
> > > seems perfectly reasonable, and potentially beneficial - users
> > > get
> > > offered
> > > only
> > > one function for each purpose (the one we consider best-
> > > practice),
> > > and they
> > > then write code that is more easily portable to other libraries.
> > > 
> > > 
> > > Backward compatibility
> > > ----------------------
> > > 
> > > No deprecations or removals of existing NumPy APIs or other
> > > backwards
> > > incompatible changes are proposed.
> > > 
> > > 
> > > High-level design
> > > -----------------
> > > 
> > > The array API standard consists of approximately 120 objects, all
> > > of
> > > which
> > > have a direct NumPy equivalent. This figure shows what is
> > > included at
> > > a
> > > high level:
> > > 
> > > .. image:: _static/nep-0047-scope-of-array-API.png
> > > 
> > > The most important changes compared to what NumPy currently
> > > offers
> > > are:
> > > 
> > > - A new array object which:
> > > 
> > >     - conforms to the casting rules and indexing behaviour
> > > specified
> > > by the
> > >       standard,
> > >     - does not have methods other than dunder methods,
> > >     - does not support the full range of NumPy indexing
> > > behaviour.
> > > Advanced
> > >       indexing with integers is not supported. Only boolean
> > > indexing
> > >       with a single (possibly multi-dimensional) boolean array is
> > > supported.
> > >       An indexing expression that selects a single element
> > > returns a
> > > 0-D
> > > array
> > >       rather than a scalar.
> > > 
> > > - Functions in the ``array_api`` namespace:
> > > 
> > >     - do not accept ``array_like`` inputs, only NumPy arrays and
> > > Python
> > > scalars
> > >     - do not support ``__array_ufunc__`` and
> > > ``__array_function__``,
> > >     - use positional-only and keyword-only parameters in their
> > > signatures,
> > >     - have inline type annotations,
> > >     - may have minor changes to signatures and semantics of
> > > individual
> > >       functions compared to their equivalents already present in
> > > NumPy,
> > >     - only support dtype literals, not format strings or other
> > > ways
> > > of
> > >       specifying dtypes
> > > 
> > > - DLPack_ support will be added to NumPy,
> > > - New syntax for "device support" will be added, through a
> > > ``.device``
> > >   attribute on the new array object, and ``device=`` keywords in
> > > array
> > > creation
> > >   functions in the ``array_api`` namespace,
> > > - Casting rules that differ from those NumPy currently has.
> > > Output
> > > dtypes
> > > can
> > >   be derived from input dtypes (i.e. no value-based casting), and
> > > 0-D
> > > arrays
> > >   are treated like >=1-D arrays.
> > > - Not all dtypes NumPy has are part of the standard. Only
> > > boolean,
> > > signed
> > > and
> > >   unsigned integers, and floating-point dtypes up to ``float64``
> > > are
> > > supported.
> > >   Complex dtypes are expected to be added in the next version of
> > > the
> > > standard.
> > >   Extended precision, string, void, object and datetime dtypes,
> > > as
> > > well as
> > >   structured dtypes, are not included.
> > > 
> > > Improvements to existing NumPy functionality that are needed
> > > include:
> > > 
> > > - Add support for stacks of matrices to some functions in
> > > ``numpy.linalg``
> > >   that are currently missing such support.
> > > - Add the ``keepdims`` keyword to ``np.argmin`` and
> > > ``np.argmax``.
> > > - Add a "never copy" mode to ``np.asarray``.
> > > 
> > > 
> > > Functions in the ``array_api`` namespace
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > Let's start with an example of a function implementation that
> > > shows
> > > the most
> > > important differences with the equivalent function in the main
> > > namespace::
> > > 
> > >     def max(x: array, /, *,
> > >             axis: Optional[Union[int, Tuple[int, ...]]] = None,
> > >             keepdims: bool = False
> > >         ) -> array:
> > >         """
> > >         Array API compatible wrapper for :py:func:`np.max
> > > <numpy.max>`.
> > >         """
> > >         return np.max._implementation(x, axis=axis,
> > > keepdims=keepdims)
> > > 
> > > This function does not accept ``array_like`` inputs, only
> > > ``ndarray``. There
> > > are multiple reasons for this. Other array libraries all work
> > > like
> > > this.
> > > Letting the user do coercion of lists, generators, or other
> > > foreign
> > > objects
> > > separately results in a cleaner design with less unexpected
> > > behaviour.
> > > It's higher-performance - less overhead from ``asarray`` calls.
> > > Static
> > > typing
> > > is easier. Subclasses will work as expected. And the slight
> > > increase
> > > in
> > > verbosity
> > > because users have to explicitly coerce to ``ndarray`` on rare
> > > occasions
> > > seems like a small price to pay.
> > > 
> > > This function does not support ``__array_ufunc__`` nor
> > > ``__array_function__``.
> > > These protocols serve a similar purpose as the array API standard
> > > module
> > > itself,
> > > but through a different mechanisms. Because only ``ndarray``
> > > instances are
> > > accepted,
> > > dispatching via one of these protocols isn't useful anymore.
> > > 
> > > This function uses positional-only parameters in its signature.
> > > This
> > > makes
> > > code
> > > more portable - writing ``max(x=x, ...)`` is no longer valid,
> > > hence
> > > if other
> > > libraries call the first parameter ``input`` rather than ``x``,
> > > that
> > > is
> > > fine.
> > > The rationale for keyword-only parameters (not shown in the above
> > > example)
> > > is
> > > two-fold: clarity of end user code, and it being easier to extend
> > > the
> > > signature
> > > in the future with keywords in the desired order.
> > > 
> > > This function has inline type annotations. Inline annotations are
> > > far
> > > easier to
> > > maintain than separate stub files. And because the types are
> > > simple,
> > > this
> > > will
> > > not result in a large amount of clutter with type aliases or
> > > unions
> > > like in
> > > the
> > > current stub files NumPy has.
> > > 
> > > 
> > > DLPack support for zero-copy data interchange
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > The ability to convert one kind of array into another kind is
> > > valuable, and
> > > indeed necessary when downstream libraries want to support
> > > multiple
> > > kinds of
> > > arrays. This requires a well-specified data exchange protocol.
> > > NumPy
> > > already
> > > supports two of these, namely the buffer protocol (i.e., PEP
> > > 3118),
> > > and
> > > the ``__array_interface__`` (Python side) / ``__array_struct__``
> > > (C
> > > side)
> > > protocol. Both work similarly, letting the "producer" describe
> > > how
> > > the data
> > > is laid out in memory so the "consumer" can construct its own
> > > kind of
> > > array
> > > with a view on that data.
> > > 
> > > DLPack works in a very similar way. The main reasons to prefer
> > > DLPack
> > > over
> > > the options already present in NumPy are:
> > > 
> > > 1. DLPack is the only protocol with device support (e.g., GPUs
> > > using
> > > CUDA or
> > >    ROCm drivers, or OpenCL devices). NumPy is CPU-only, but other
> > > array
> > >    libraries are not. Having one protocol per device isn't
> > > tenable,
> > > hence
> > >    device support is a must.
> > > 2. Widespread support. DLPack has the widest adoption of all
> > > protocols, only
> > >    NumPy is missing support. And the experiences of other
> > > libraries
> > > with it
> > >    are positive. This contrasts with the protocols NumPy does
> > > support, which
> > >    are used very little - when other libraries want to
> > > interoperate
> > > with
> > >    NumPy, they typically use the (more limited, and NumPy-
> > > specific)
> > >    ``__array__`` protocol.
> > > 
> > > Adding support for DLPack to NumPy entails:
> > > 
> > > - Adding a ``ndarray.__dlpack__`` method
> > > - Adding a ``from_dlpack`` function, which takes as input an
> > > object
> > >   supporting ``__dlpack__``, and returns an ``ndarray``.
> > > 
> > > DLPack is currently a ~200 LoC header, and is meant to be
> > > included
> > > directly, so
> > > no external dependency is needed. Implementation should be
> > > straightforward.
> > > 
> > > 
> > > Syntax for device support
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > NumPy itself is CPU-only, so it clearly doesn't have a need for
> > > device
> > > support.
> > > However, other libraries (e.g. TensorFlow, PyTorch, JAX, MXNet)
> > > support
> > > multiple types of devices: CPU, GPU, TPU, and more exotic
> > > hardware.
> > > To write portable code on systems with multiple devices, it's
> > > often
> > > necessary
> > > to create new arrays on the same device as some other array, or
> > > check
> > > that
> > > two arrays live on the same device. Hence syntax for that is
> > > needed.
> > > 
> > > The array object will have a ``.device`` attribute which enables
> > > comparing
> > > devices of different arrays (they only should compare equal if
> > > both
> > > arrays
> > > are
> > > from the same library and it's the same hardware device).
> > > Furthermore,
> > > ``device=`` keywords in array creation functions are needed. For
> > > example::
> > > 
> > >     def empty(shape: Union[int, Tuple[int, ...]], /, *,
> > >               dtype: Optional[dtype] = None,
> > >               device: Optional[device] = None) -> array:
> > >         """
> > >         Array API compatible wrapper for :py:func:`np.empty
> > > <numpy.empty>`.
> > >         """
> > >         return np.empty(shape, dtype=dtype, device=device)
> > > 
> > > The implementation for NumPy may be as simple as setting the
> > > device
> > > attribute to
> > > the string ``'cpu'`` and raising an exception if array creation
> > > functions
> > > encounter any other value.
> > > 
> > > 
> > > Dtypes and casting rules
> > > ~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > The supported dtypes in this namespace are boolean, 8/16/32/64-
> > > bit
> > > signed
> > > and
> > > unsigned integer, and 32/64-bit floating-point dtypes. These will
> > > be
> > > added
> > > to
> > > the namespace as dtype literals with the expected names (e.g.,
> > > ``bool``,
> > > ``uint16``, ``float64``).
> > > 
> > > The most obvious omissions are the complex dtypes. The rationale
> > > for
> > > the
> > > lack
> > > of complex support in the first version of the array API standard
> > > is
> > > that
> > > several
> > > libraries (PyTorch, MXNet) are still in the process of adding
> > > support
> > > for
> > > complex dtypes. The next version of the standard is expected to
> > > include
> > > ``complex64``
> > > and ``complex128`` (see `this issue <
> > > https://github.com/data-apis/array-api/issues/102>`__
> > > for more details).
> > > 
> > > Specifying dtypes to functions, e.g. via the ``dtype=`` keyword,
> > > is
> > > expected
> > > to only use the dtype literals. Format strings, Python builtin
> > > dtypes, or
> > > string representations of the dtype literals are not accepted -
> > > this
> > > will
> > > improve readability and portability of code at little cost.
> > > 
> > > Casting rules are only defined between different dtypes of the
> > > same
> > > kind.
> > > The
> > > rationale for this is that mixed-kind (e.g., integer to floating-
> > > point)
> > > casting behavior differs between libraries. NumPy's mixed-kind
> > > casting
> > > behavior doesn't need to be changed or restricted, it only needs
> > > to
> > > be
> > > documented that if users use mixed-kind casting, their code may
> > > not
> > > be
> > > portable.
> > > 
> > > .. image:: _static/nep-0047-casting-rules-lattice.png
> > > 
> > > *Type promotion diagram. Promotion between any two types is given
> > > by
> > > their
> > > join on this lattice. Only the types of participating arrays
> > > matter,
> > > not
> > > their values. Dashed lines indicate that behaviour for Python
> > > scalars
> > > is
> > > undefined on overflow. Boolean, integer and floating-point dtypes
> > > are
> > > not
> > > connected, indicating mixed-kind promotion is undefined.*
> > > 
> > > The most important difference between the casting rules in NumPy
> > > and
> > > in the
> > > array API standard is how scalars and 0-dimensional arrays are
> > > handled. In
> > > the standard, array scalars do not exist and 0-dimensional arrays
> > > follow the
> > > same casting rules as higher-dimensional arrays.
> > > 
> > > See the `Type Promotion Rules section of the array API standard <
> > > 
> >     
> > https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html
> > > > `__
> > > for more details.
> > > 
> > > .. note::
> > > 
> > >     It is not clear what the best way is to support the different
> > > casting
> > > rules
> > >     for 0-dimensional arrays and no value-based casting. One
> > > option
> > > may be
> > > to
> > >     implement this second set of casting rules, keep them
> > > private,
> > > mark the
> > >     array API functions with a private attribute that says they
> > > adhere to
> > >     these different rules, and let the casting machinery check
> > > whether for
> > >     that attribute.
> > > 
> > >     This needs discussion.
> > > 
> > > 
> > > Indexing
> > > ~~~~~~~~
> > > 
> > > An indexing expression that would return a scalar with
> > > ``ndarray``,
> > > e.g.
> > > ``arr_2d[0, 0]``, will return a 0-D array with the new array
> > > object.
> > > There
> > > are
> > > several reasons for that: array scalars are largely considered a
> > > design
> > > mistake
> > > which no other array library copied; it works better for non-CPU
> > > libraries
> > > (typically arrays can live on the device, scalars live on the
> > > host);
> > > and
> > > it's
> > > simply a consistent design. To get a Python scalar out of a 0-D
> > > array, one
> > > can
> > > simply use the builtin for the type, e.g. ``float(arr_0d)``.
> > > 
> > > The other `indexing modes in the standard <
> > > 
> >     
> > https://data-apis.github.io/array-api/latest/API_specification/indexing.html
> > > > `__
> > > do work largely the same as they do for ``numpy.ndarray``. One
> > > noteworthy
> > > difference is that clipping in slice indexing (e.g., ``a[:n]``
> > > where
> > > ``n``
> > > is
> > > larger than the size of the first axis) is unspecified behaviour,
> > > because
> > > that kind of check can be expensive on accelerators.
> > > 
> > > The lack of advanced indexing, and boolean indexing being limited
> > > to
> > > a
> > > single
> > > n-D boolean array, is due to those indexing modes not being
> > > suitable
> > > for all
> > > types of arrays or JIT compilation. Their absence does not seem
> > > to be
> > > problematic; if a user or library author wants to use them, they
> > > can
> > > do so
> > > through zero-copy conversion to ``numpy.ndarray``. This will
> > > signal
> > > correctly
> > > to whomever reads the code that it is then NumPy-specific rather
> > > than
> > > portable
> > > to all conforming array types.
> > > 
> > > 
> > > 
> > > The array object
> > > ~~~~~~~~~~~~~~~~
> > > 
> > > The array object in the standard does not have methods other than
> > > dunder
> > > methods. The rationale for that is that not all array libraries
> > > have
> > > methods
> > > on their array object (e.g., TensorFlow does not). It also
> > > provides
> > > only a
> > > single way of doing something, rather than have functions and
> > > methods
> > > that
> > > are effectively duplicate.
> > > 
> > > Mixing operations that may produce views (e.g., indexing,
> > > ``nonzero``)
> > > in combination with mutation (e.g., item or slice assignment) is
> > > `explicitly documented in the standard to not be supported <
> > > 
> >     
> > https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html
> > > > `__.
> > > This cannot easily be prohibited in the array object itself;
> > > instead
> > > this
> > > will
> > > be guidance to the user via documentation.
> > > 
> > > The standard current does not prescribe a name for the array
> > > object
> > > itself.
> > > We propose to simply name it ``ndarray``. This is the most
> > > obvious
> > > name, and
> > > because of the separate namespace should not clash with
> > > ``numpy.ndarray``.
> > > 
> > > 
> > > Implementation
> > > --------------
> > > 
> > > .. note::
> > > 
> > >     This section needs a lot more detail, which will gradually be
> > > added when
> > >     the implementation progresses.
> > > 
> > > A prototype of the ``array_api`` namespace can be found in
> > >     
> > > https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api
> > > .
> > > The docstring in its ``__init__.py`` has notes on completeness of
> > > the
> > > implementation. The code for the wrapper functions also contains
> > > ``#
> > > Note:``
> > > comments everywhere there is a difference with the NumPy API.
> > > Two important parts that are not implemented yet are the new
> > > array
> > > object
> > > and
> > > DLPack support. Functions may need changes to ensure the changed
> > > casting
> > > rules
> > > are respected.
> > > 
> > > The array object
> > > ~~~~~~~~~~~~~~~~
> > > 
> > > Regarding the array object implementation, we plan to start with
> > > a
> > > regular
> > > Python class that wraps a ``numpy.ndarray`` instance. Attributes
> > > and
> > > methods
> > > can forward to that wrapped instance, applying input validation
> > > and
> > > implementing changed behaviour as needed.
> > > 
> > > The casting rules are probably the most challenging part. The in-
> > > progress
> > > dtype system refactor (NEPs 40-43) should make implementing the
> > > correct
> > > casting
> > > behaviour easier - it is already moving away from value-based
> > > casting
> > > for
> > > example.
> > > 
> > > 
> > > The dtype objects
> > > ~~~~~~~~~~~~~~~~~
> > > 
> > > We must be able to compare dtypes for equality, and expressions
> > > like
> > > these
> > > must
> > > be possible::
> > > 
> > >     np.array_api.some_func(..., dtype=x.dtype)
> > > 
> > > The above implies it would be nice to have ``np.array_api.float32
> > > ==
> > > np.array_api.ndarray(...).dtype``.
> > > 
> > > Dtypes should not be assumed to have a class hierarchy by users,
> > > however we
> > > are
> > > free to implement it with a class hierarchy if that's convenient.
> > > We
> > > considered
> > > the following options to implement dtype objects:
> > > 
> > > 1. Alias dtypes to those in the main namespace. E.g.,
> > > ``np.array_api.float32 =
> > >    np.float32``.
> > > 2. Make the dtypes instances of ``np.dtype``. E.g.,
> > > ``np.array_api.float32 =
> > >    np.dtype(np.float32)``.
> > > 3. Create new singleton classes with only the required
> > > methods/attributes
> > >    (currently just ``__eq__``).
> > > 
> > > It seems like (2) would be easiest from the perspective of
> > > interacting with
> > > functions outside the main namespace. And (3) would adhere best
> > > to
> > > the
> > > standard.
> > > 
> > > TBD: the standard does not yet have a good way to inspect
> > > properties
> > > of a
> > > dtype, to ask questions like "is this an integer dtype?". Perhaps
> > > this is
> > > easy
> > > enough to do for users, like so::
> > > 
> > >     def _get_dtype(dt_or_arr):
> > >         return dt_or_arr.dtype if hasattr(dt_or_arr, 'dtype')
> > > else
> > > dt_or_arr
> > > 
> > >     def is_floating(dtype_or_array):
> > >         dtype = _get_dtype(dtype_or_array)
> > >         return dtype in (float32, float64)
> > > 
> > >     def is_integer(dtype_or_array):
> > >         dtype = _get_dtype(dtype_or_array)
> > >         return dtype in (uint8, uint16, uint32, uint64, int8,
> > > int16,
> > > int32,
> > > int64)
> > > 
> > > However it could make sense to add to the standard. Note that
> > > NumPy
> > > itself
> > > currently does not have a great for asking such questions, see
> > > `gh-17325 <https://github.com/numpy/numpy/issues/17325>`__.
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [email protected]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [email protected]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part

_______________________________________________
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NEP: array API standard adoption (NEP 47)

Reply via email to