On 2019-09-07 15:33, Ralf Gommers wrote:
On Sat, Sep 7, 2019 at 1:07 PM Sebastian Berg
<sebast...@sipsolutions.net> wrote:

On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote:


<snip>

That's part of it. The concrete problems it's solving are
threefold:
Array creation functions can be overridden.
Array coercion is now covered.
"Default implementations" will allow you to re-write your NumPy
array more easily, when such efficient implementations exist in
terms of other NumPy functions. That will also help achieve
similar
semantics, but as I said, they're just "default"...


There may be another very concrete one (that's not yet in the
NEP):
allowing other libraries that consume ndarrays to use overrides.
An
example is numpy.fft: currently both mkl_fft and pyfftw
monkeypatch
NumPy, something we don't like all that much (in particular for
mkl_fft, because it's the default in Anaconda).
`__array_function__`
isn't able to help here, because it will always choose NumPy's own
implementation for ndarray input. With unumpy you can support
multiple libraries that consume ndarrays.

Another example is einsum: if you want to use opt_einsum for all
inputs (including ndarrays), then you cannot use np.einsum. And
yet
another is using bottleneck (
https://kwgoodman.github.io/bottleneck-doc/reference.html) for
nan-
functions and partition. There's likely more of these.

The point is: sometimes the array protocols are preferred (e.g.
Dask/Xarray-style meta-arrays), sometimes unumpy-style dispatch
works
better. It's also not necessarily an either or, they can be
complementary.


Let me try to move the discussion from the github issue here (this
may
not be the best place). (https://github.com/numpy/numpy/issues/14441
which asked for easier creation functions together with
`__array_function__`).

I think an important note mentioned here is how users interact with
unumpy, vs. __array_function__. The former is an explicit opt-in,
while
the latter is implicit choice based on an `array-like` abstract base
class and functional type based dispatching.

To quote NEP 18 on this: "The downsides are that this would require
an
explicit opt-in from all existing code, e.g., import numpy.api as
np,
and in the long term would result in the maintenance of two separate
NumPy APIs. Also, many functions from numpy itself are already
overloaded (but inadequately), so confusion about high vs. low level
APIs in NumPy would still persist."
(I do think this is a point we should not just ignore, `uarray` is a
thin layer, but it has a big surface area)

Now there are things where explicit opt-in is obvious. And the FFT
example is one of those, there is no way to implicitly choose
another
backend (except by just replacing it, i.e. monkeypatching) [1]. And
right now I think these are _very_ different.

Now for the end-users choosing one array-like over another, seems
nicer
as an implicit mechanism (why should I not mix sparse, dask and
numpy
arrays!?). This is the promise `__array_function__` tries to make.
Unless convinced otherwise, my guess is that most library authors
would
strive for implicit support (i.e. sklearn, skimage, scipy).

Circling back to creation and coercion. In a purely Object type
system,
these would be classmethods, I guess, but in NumPy and the libraries
above, we are lost.

Solution 1: Create explicit opt-in, e.g. through uarray. (NEP-31)
* Required end-user opt-in.

* Seems cleaner in many ways
* Requires a full copy of the API.

bullet 1 and 3 are not required. if we decide to make it default, then
there's no separate namespace

It does require explicit opt-in to have any benefits to the user.


Solution 2: Add some coercion "protocol" (NEP-30) and expose a way
to
create new arrays more conveniently. This would practically mean
adding
an `array_type=np.ndarray` argument.
* _Not_ used by end-users! End users should use dask.linspace!
* Adds "strange" API somewhere in numpy, and possible a new
"protocol" (additionally to coercion).[2]

I still feel these solve different issues. The second one is
intended
to make array likes work implicitly in libraries (without end users
having to do anything). While the first seems to force the end user
to
opt in, sometimes unnecessarily:

def my_library_func(array_like):
exp = np.exp(array_like)
idx = np.arange(len(exp))
return idx, exp

Would have all the information for implicit opt-in/Array-like
support,
but cannot do it right now.

Can you explain this a bit more? `len(exp)` is a number, so
`np.arange(number)` doesn't really have any information here.


Right, but as a library author, I want a way a way to make it use the same type as `array_like` in this particular function, that is the point! The end-user already signaled they prefer say dask, due to the array that was actually passed in. (but this is just repeating what is below I think).

This is what I have been wondering, if
uarray/unumpy, can in some way help me make this work (even
_without_
the end user opting in).

good question. if that needs to work in the absence of the user doing
anything, it should be something like

with unumpy.determine_backend(exp):
   unumpy.arange(len(exp))   # or np.arange if we make unumpy default

to get the equivalent to `np.arange_like(len(exp), array_type=exp)`.

Note, that `determine_backend` thing doesn't exist today.


Exactly, that is what I have been wondering about, there may be more issues around that. If it existed, we may be able to solve the implicit library usage by making libraries use unumpy (or similar). Although, at that point we half replace `__array_function__` maybe. However, the main point is that without such a functionality, NEP 30 and NEP 31 seem to solve slightly different issues with respect to how they interact with the end-user (opt in)?

We may decide that we do not want to solve the library users issue of wanting to support implicit opt-in for array like inputs because it is a rabbit hole. But we may need to discuss/argue a bit more that it really is a deep enough rabbit hole that it is not worth the trouble.

The reason is that simply, right now I am very
clear on the need for this use case, but not sure about the need for
end user opt in, since end users can just use dask.arange().

I don't get the last part. The arange is inside a library function, so
a user can't just go in and change things there.

A "user" here means "end user". An end user writes a script, and they can easily change `arr = np.linspace(10)` to `arr = dask.linspace(10)`, or more likely just use one within one script and the other within another script, while both use the same sklearn functions.
(Although using a backend switching may be nicer in some contexts)

A library provider (library user of unumpy/numpy) of course cannot just use dask conveniently,
unless they write their own `guess_numpy_like_module()` function first.


Cheers,

Ralf

Cheers,

Sebastian

[1] To be honest, I do think a lot of the "issues" around
monkeypatching exists just as much with backend choosing, the main
difference seems to me that a lot of that:
1. monkeypatching was not done explicit
(import mkl_fft; mkl_fft.monkeypatch_numpy())?
2. A backend system allows libaries to prefer one locally?
(which I think is a big advantage)

[2] There are the options of adding `linspace_like` functions
somewhere
in a numpy submodule, or adding `linspace(...,
array_type=np.ndarray)`,
or simply inventing a new "protocl" (which is not really a
protocol?),
and make it `ndarray.__numpy_like_creation_functions__.arange()`.

Actually, after writing this I just realized something. With
1.17.x
we have:

```
In [1]: import dask.array as da



In [2]: d = da.from_array(np.linspace(0, 1))



In [3]: np.fft.fft(d)


Out[3]: dask.array<fft, shape=(50,), dtype=complex128,
chunksize=(50,)>
```

In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this
won't
work. We have no bug report yet because 1.17.x hasn't landed in
conda
defaults yet (perhaps this is a/the reason why?), but it will be a
problem.

The import numpy.overridable part is meant to help garner
adoption,
and to prefer the unumpy module if it is available (which will
continue to be developed separately). That way it isn't so
tightly
coupled to the release cycle. One alternative Sebastian Berg
mentioned (and I am on board with) is just moving unumpy into
the
NumPy organisation. What we fear keeping it separate is that the
simple act of a pip install unumpy will keep people from using
it
or trying it out.

Note that this is not the most critical aspect. I pushed for
vendoring as numpy.overridable because I want to not derail the
comparison with NEP 30 et al. with a "should we add a dependency"
discussion. The interesting part to decide on first is: do we need
the unumpy override mechanism? Vendoring opt-in vs. making it
default
vs. adding a dependency is of secondary interest right now.

Cheers,
Ralf



_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to