On Sat, 2020-04-25 at 10:52 -0700, Stephan Hoyer wrote:
> On Sat, Apr 25, 2020 at 10:40 AM Ralf Gommers <ralf.gomm...@gmail.com
> >
> wrote:
> 
> > 
> > On Fri, Apr 24, 2020 at 12:35 PM Eric Wieser <
> > wieser.eric+nu...@gmail.com>
> > wrote:
> > 
> > > Perhaps worth mentioning that we've discussed this sort of API
> > > before, in
> > > https://github.com/numpy/numpy/pull/11897.
> > > 
> > > Under that proposal, the api would be something like:
> > > 
> > > * `copy=True` - always copy, like it is today
> > > * `copy=False` - copy if needed, like it is today
> > > * `copy=np.never_copy` - never copy, throw an exception if not
> > > possible
> > > 
> > 
> > There's a couple of issues I see with using `copy` for __array__:
> > - copy is already weird (False doesn't mean no), and a [bool,
> > some_obj_or_str] keyword isn't making that better
> > - the behavior we're talking about can do more than copying, e.g.
> > for
> > PyTorch it would modify the autograd graph by adding detach(), and
> > for
> > sparse it's not just "make a copy" (which implies doubling memory
> > use) but
> > it densifies which can massively blow up the memory.
> > - I'm -1 on adding things to the main namespace (never_copy) for
> > something
> > that can be handled differently (like a string, or a new keyword)
> > 
> > tl;dr a new `force` keyword would be better
> > 
> 
> I agree, “copy” is not a good description of this desired coercion
> behavior.
> 
> A new keyword argument like “force” would be much clearer.
> 

That seems fine and practical. But, in the end it seems to me that the
`force=` keyword, just means that some projects want to teach their
users that:

1. `np.asarray()` can be expensive (and may always copy)
2. `np.asarray()` always loses type properties

while others do not choose to teach about it.  There seems very little
or even no "promise" attached to either `force=True` or `force=False`.


In the end, the question is whether sparse will actually want to
implement `force=True` if the main reason we add is for library use.
There is no difference between a visualization library or numpy. In
both cases the users memory will blow up just the same.

As for PyTorch, is `.detach()` even a good reason?  Maybe I am missing
things, but:

>>> torch.ones(10, requires_grad=True) + np.arange(10)
# RuntimeError: Can't call numpy() on Variable that requires grad. Use
var.detach().numpy() instead.

So arguably, there is no type-safety concern due to `.detach()`. There
is an (obvious) general loss of type information that always occurs
with an `np.asarray` call.
But I do not see that creating any openings for bugs here, due to the
wisdom of not allowing the above operation.
In fact, it actually seems much worse for for xarray, or pandas. They
do support the above operation and will potentially mess up if the
arange was previously an xarray with a matching index, but different
order.


I am very much in favor of adding such things, but I still lack a bit
of clarity as to whom we would be helping?

If end-users will actually use `np.asarray(..., force=True)` over
special methods, then great! But I am currently not sure the type-
safety argument is all that big of a point.  And the performance or
memory-blowup argument remains true even for visualization libraries
(where the array is purely input and never output as such).


But yes, "never copy" is a somewhat different extension to `__array__`
and `np.asarray`. It guarantees high speed and in-place behaviour which
is useful for different settings.

- Sebastian


> 
> > Cheers,
> > Ralf
> > 
> > 
> > > I think the discussion stalled on the precise spelling of the
> > > third
> > > option.
> > > 
> > > `__array__` was not discussed there, but it seems like adding the
> > > `copy`
> > > argument to `__array__` would be a perfectly reasonable
> > > extension.
> > > 
> > > Eric
> > > 
> > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias <
> > > j...@fastmail.com>
> > > wrote:
> > > 
> > > > Hi everyone,
> > > > 
> > > > One bit of expressivity we would miss is “copy if necessary,
> > > > but
> > > > > otherwise don’t bother”, but there are workarounds to this.
> > > > > 
> > > > 
> > > > After a side discussion with Stéfan van der Walt, we came up
> > > > with
> > > > `allow_copy=True`, which would express to the downstream
> > > > library that we
> > > > don’t mind waiting, but that zero-copy would also be ok.
> > > > 
> > > > This sounds like the sort of thing that is use case driven. If
> > > > enough
> > > > projects want to use it, then I have no objections to adding
> > > > the keyword.
> > > > OTOH, we need to be careful about adding too many
> > > > interoperability tricks
> > > > as they complicate the code and makes it hard for folks to
> > > > determine the
> > > > best solution. Interoperability is a hot topic and we need to
> > > > be careful
> > > > not put too leave behind too many experiments in the NumPy
> > > > code.  Do you
> > > > have any other ideas of how to achieve the same effect?
> > > > 
> > > > 
> > > > Personally, I don’t have any other ideas, but would be happy to
> > > > hear
> > > > some!
> > > > 
> > > > My view regarding API/experiment creep is that `__array__` is
> > > > the oldest
> > > > and most basic of all the interop tricks and that this can be
> > > > safely
> > > > maintained for future generations. Currently it only takes
> > > > `dtype=` as a
> > > > keyword argument, so it is a very lean API. I think this
> > > > particular use
> > > > case is very natural and I’ve encountered the reluctance to
> > > > implicitly copy
> > > > twice, so I expect it is reasonably common.
> > > > 
> > > > Regarding difficulty in determining the best solution, I would
> > > > be happy
> > > > to contribute to the dispatch basics guide together with the
> > > > new kwarg. I
> > > > agree that the protocols are getting quite numerous and I
> > > > couldn’t find a
> > > > single place that gathers all the best practices together. But,
> > > > to
> > > > reiterate my point: `__array__` is the simplest of these and I
> > > > think this
> > > > keyword is pretty safe to add.
> > > > 
> > > > For ease of discussion, here are the API options discussed so
> > > > far, as
> > > > well as a few extra that I don’t like but might trigger other
> > > > ideas:
> > > > 
> > > > np.asarray(my_duck_array, allow_copy=True)  # default is False,
> > > > or None
> > > > -> leave it to the duck array to decide
> > > > np.asarray(my_duck_array, copy=True)  # always copies, but, if
> > > > supported
> > > > by the duck array, defers to it for the copy
> > > > np.asarray(my_duck_array, copy=‘allow’)  # could take values
> > > > ‘allow’,
> > > > ‘force’, ’no’, True(=‘force’), False(=’no’)
> > > > np.asarray(my_duck_array, force_copy=False, allow_copy=True)  #
> > > > separate
> > > > concepts, but unclear what force_copy=True, allow_copy=False
> > > > means!
> > > > np.asarray(my_duck_array, force=True)
> > > > 
> > > > Juan.
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion@python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to