Hi Ralf,

You're right, the problem is with the added keyword argument (which would
appear also if we did not still have to support the old .sum method
override but just dispatched to __array_ufunc__ with `np.add.reduce` -
maybe worse given that it seems likely the reduce method has seen much less
testing in  __array_ufunc__ implementations).

Still, I do think the question stands: we implement a `nansum` for our
ndarray class a certain way, and provide ways to override it (three now, in
fact). Is it really reasonable to expect that we wait 4 versions for other
packages to keep up with this, and thus get stuck with given internal
implementations?

Aside: note that the present version of the nanfunctions relies on turning
the arguments into arrays and copying 0s into it - that suggests that
currently they do not work for duck arrays like Dask.

All the best,

Marten

On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers <ralf.gomm...@gmail.com> wrote:

>
>
> On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt <stef...@berkeley.edu>
> wrote:
>
>> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
>> > In a way, I brought it up mostly as a concrete example of an internal
>> > implementation which we cannot change to an objectively cleaner one
>> because
>> > other packages rely on an out-of-date numpy API.
>>
>
> I think this is not the right way to describe the problem (see below).
>
>
>> This, and the comments Nathaniel made on the array function thread, are
>> important to take note of.  Would it be worth updating NEP 18 with a
>> list of pitfalls?  Or should this be a new informational NEP that
>> discusses—on a higher level—the benefits, risks, and design
>> considerations of providing protocols?
>>
>
> That would be a nice thing to do (the higher level one), but in this case
> I think the issue has little to do with NEP 18. The summary of the issue in
> this thread is a little brief, so let me try to clarify.
>
> 1. np.sum gained a new `where=` keyword in 1.17.0
> 2. using np.sum(x) will detect a `x.sum` method if it's present and try to
> use that
> 3. the `_wrapreduction` utility that forwards the function to the method
> will compare signatures of np.sum and x.sum, and throw an error if there's
> a mismatch for any keywords that have a value other than the default
> np._NoValue
>
> Code to check this:
> >>> x1 = np.arange(5)
> >>> x2 = np.asmatrix(x1)
> >>> np.sum(x1)  # works
> >>> np.sum(x2)  # works
> >>> np.sum(x1, where=x1>3)  # works
> >>> np.sum(x2, where=x2>3)  # _wrapreduction throws TypeError
> ...
> TypeError: sum() got an unexpected keyword argument 'where'
>
> Note that this is not specific to np.matrix. Using pandas.Series you also
> get a TypeError:
> >>> y = pd.Series(x1)
> >>> np.sum(y)  # works
> >>> np.sum(y, where=y>3)  # pandas throws TypeError
> ...
> TypeError: sum() got an unexpected keyword argument 'where'
>
> The issue is that when we have this kind of forwarding logic, irrespective
> of how it's implemented, new keywords cannot be used until the array-like
> objects with the methods that get forwarded to gain the same keyword.
>
> tl;dr this is simply a cost we have to be aware of when either proposing
> to add new keywords, or when proposing any kind of dispatching logic (in
> this case `_wrapreduction`).
>
> Regarding internal use of  `np.sum(..., where=)`: this should not be done
> until at least 4-5 versions from now, and preferably way longer than that.
> Because doing so will break already released versions of Pandas, Dask, and
> other libraries with array-like objects.
>
> Cheers,
> Ralf
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to