Hi All, `nanfunctions` use `asanyarray` currently, which as Ralf notes scuppers the plan to use `asarray` - sorry to have been sloppy with stating they "convert to array".
And, yes, I'll admit to forgetting that we are introducing `where` only in 1.17 - clearly we cannot rely on other classes to have already implemented it!! (This is the problem with having done it myself - it seems ages ago!) For the old implementation, there was indeed only one way to override (`__array_function__`) for non-subclasses, but for the new implementation there are three ways, though the first two overlap: - support `.sum()` with all its arguments and `__array_ufunc__` for `isnan` - support `__array_ufunc__` including reductions (with all their arguments) - support `__array_function__` I think that eventually one would like to move to the case where there is only one (`__array_ufunc__` in this case, since really it is just a combination of two ufuncs, isnan and add.reduce in this case). Anyway, I guess this is still a good example to consider for how we should go about getting to a new implementation, ideally with just a single-way to override? Indeed, how do we actually envisage deprecating the use of `__array_function__` for a given part of the numpy API? Are we allowed to go cold-turkey if the new implementation is covered by `__array_ufunc__`? All the best, Marten On Thu, Jun 13, 2019 at 4:18 AM Ralf Gommers <ralf.gomm...@gmail.com> wrote: > > > On Thu, Jun 13, 2019 at 3:16 AM Stephan Hoyer <sho...@gmail.com> wrote: > >> On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk < >> m.h.vankerkw...@gmail.com> wrote: >> >>> Hi Ralf, >>> >>> You're right, the problem is with the added keyword argument (which >>> would appear also if we did not still have to support the old .sum method >>> override but just dispatched to __array_ufunc__ with `np.add.reduce` - >>> maybe worse given that it seems likely the reduce method has seen much less >>> testing in __array_ufunc__ implementations). >>> >>> Still, I do think the question stands: we implement a `nansum` for our >>> ndarray class a certain way, and provide ways to override it (three now, in >>> fact). Is it really reasonable to expect that we wait 4 versions for other >>> packages to keep up with this, and thus get stuck with given internal >>> implementations? >>> >> > In general, I'd say yes. This is a problem of our own making. If an > implementation uses np.sum on an argument that can be a subclass or duck > array (i.e. we didn't coerce with asarray), then the code is calling out to > outside of numpy. At that point we have effectively made something public. > We can still change it, but only if we're sure that we don't break things > in the process (i.e. we then insert a new asarray, something you're not > happy about in general ....). > > I don't get the "three ways to override nansum" by the way. There's only > one I think: __array_function__. There may be more for the sum() method. > > Another way to say the same thing: if a subclass does everything right, > and we decide to add a new keyword to some function in 1.17 without > considering those subclasses, then turning around straight after and saying > "oh those subclasses rely on our old API, it may be okay to either break > them or send them a deprecation warning" (which is I think what you were > advocating for - your 4 and 3 options) is not reasonable. > > >>> Aside: note that the present version of the nanfunctions relies on >>> turning the arguments into arrays and copying 0s into it - that suggests >>> that currently they do not work for duck arrays like Dask. >>> >> > There being an issue with matrix in the PR suggests there is an issue for > subclasses at least? > > >> Agreed. We could safely rewrite things to use np.asarray(), without any >> need to worry about backends compatibility. From an API perspective, >> nothing would change -- we already cast inputs into base numpy arrays >> inside the _replace_nan() routine. >> > > In that case yes, nansum can be rewritten. However, it's only because of > the use of (as)array - if it were asanyarray that would already scupper the > plan. > > Cheers, > Ralf > > >> >>> >>> All the best, >>> >>> Marten >>> >>> On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers <ralf.gomm...@gmail.com> >>> wrote: >>> >>>> >>>> >>>> On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt < >>>> stef...@berkeley.edu> wrote: >>>> >>>>> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote: >>>>> > In a way, I brought it up mostly as a concrete example of an internal >>>>> > implementation which we cannot change to an objectively cleaner one >>>>> because >>>>> > other packages rely on an out-of-date numpy API. >>>>> >>>> >>>> I think this is not the right way to describe the problem (see below). >>>> >>>> >>>>> This, and the comments Nathaniel made on the array function thread, are >>>>> important to take note of. Would it be worth updating NEP 18 with a >>>>> list of pitfalls? Or should this be a new informational NEP that >>>>> discusses—on a higher level—the benefits, risks, and design >>>>> considerations of providing protocols? >>>>> >>>> >>>> That would be a nice thing to do (the higher level one), but in this >>>> case I think the issue has little to do with NEP 18. The summary of the >>>> issue in this thread is a little brief, so let me try to clarify. >>>> >>>> 1. np.sum gained a new `where=` keyword in 1.17.0 >>>> 2. using np.sum(x) will detect a `x.sum` method if it's present and try >>>> to use that >>>> 3. the `_wrapreduction` utility that forwards the function to the >>>> method will compare signatures of np.sum and x.sum, and throw an error if >>>> there's a mismatch for any keywords that have a value other than the >>>> default np._NoValue >>>> >>>> Code to check this: >>>> >>> x1 = np.arange(5) >>>> >>> x2 = np.asmatrix(x1) >>>> >>> np.sum(x1) # works >>>> >>> np.sum(x2) # works >>>> >>> np.sum(x1, where=x1>3) # works >>>> >>> np.sum(x2, where=x2>3) # _wrapreduction throws TypeError >>>> ... >>>> TypeError: sum() got an unexpected keyword argument 'where' >>>> >>>> Note that this is not specific to np.matrix. Using pandas.Series you >>>> also get a TypeError: >>>> >>> y = pd.Series(x1) >>>> >>> np.sum(y) # works >>>> >>> np.sum(y, where=y>3) # pandas throws TypeError >>>> ... >>>> TypeError: sum() got an unexpected keyword argument 'where' >>>> >>>> The issue is that when we have this kind of forwarding logic, >>>> irrespective of how it's implemented, new keywords cannot be used until the >>>> array-like objects with the methods that get forwarded to gain the same >>>> keyword. >>>> >>>> tl;dr this is simply a cost we have to be aware of when either >>>> proposing to add new keywords, or when proposing any kind of dispatching >>>> logic (in this case `_wrapreduction`). >>>> >>>> Regarding internal use of `np.sum(..., where=)`: this should not be >>>> done until at least 4-5 versions from now, and preferably way longer than >>>> that. Because doing so will break already released versions of Pandas, >>>> Dask, and other libraries with array-like objects. >>>> >>>> Cheers, >>>> Ralf >>>> >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion@python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion