> Begin forwarded message:
>
> From: Stephan Hoyer
> Date: Friday, Nov 09, 2018 at 3:19 PM
> To: Hameer Abbasi
> Cc: Stefan van der Walt , Marten van Kerkwijk
> Subject: asarray/anyarray; matrix/subclass
>
> This is a great discussion, but let's try to have it in public (e.g., on the 
> NumPy mailing list).
> On Fri, Nov 9, 2018 at 8:42 AM Hameer Abbasi <einstein.edi...@gmail.com 
> (mailto:einstein.edi...@gmail.com)> wrote:
> > Hi Stephan,
> >
> > The issue I have with writing another function is that asarray/asanyarray 
> > are so widely used that it’d be a huge maintenance task to update them 
> > throughout NumPy, not to mention other codebases, not to mention other 
> > codebases having to rely on newer NumPy versions for this. In short, it 
> > would dramatically reduce adaptability of this function.
> >
> > One path we can take is to allow asarray/asanyarray to be overridable via 
> > __array_function__ (the former is debatable). This solves most of our 
> > duck-array related issues without introducing another protocol.
> >
> > Regardless of what path we choose, I would recommend changing asanyarray to 
> > not pass through np.matrix regardless, instead passing through 
> > mat.view(type=np.ndarray) instead, which has O(1) cost and memory. In the 
> > vast majority of contexts, it’s used to ensure an array-ish structure for 
> > another operation, and usually there’s no guarantee that what comes out 
> > will be a matrix anyway. I suggest we raise a FutureWarning and then change 
> > this behaviour.
> >
> > There have been a number of discussions about deprecating np.matrix (and a 
> > few about MaskedArray as well, though there are less compelling reasons for 
> > that one). I suggest we start down that path as soon as possible. The 
> > biggest (only?) user I know of blocking that is scipy.sparse, and we’re on 
> > our way to replacing that with PyData/Sparse.
> >
> > Best Regards,
> > Hameer Abbasi
> >
> >
> > > On Friday, Nov 09, 2018 at 1:26 AM, Stephan Hoyer <sho...@gmail.com 
> > > (mailto:sho...@gmail.com)> wrote:
> > > Hi Hameer,
> > >
> > > I'd love to talk about this in more detail. I agree that something like 
> > > this is needed.
> > >
> > > The challenge with reusing an existing function like asanyarray() is that 
> > > there is at least one (somewhat?) widely used ndarray subclass that badly 
> > > violates the Liskov Substitution Principle: np.matrix.
> > >
> > > NumPy can't really use np.asanyarray() widely for internal purposes until 
> > > we don't have to worry about np matrix. We might special case np.matrix 
> > > in some way, but then asanyarray() would do totally opposite things on 
> > > different versions of NumPy. It's almost certainly a better idea to just 
> > > write a new function with the desired semantics, and "soft deprecate" 
> > > asanyarray(). The new function can explicitly black list np.matrix, as 
> > > well as any other subclasses we know of that badly violate LSP.
> > >
> > > Cheers,
> > > Stephan
> > > On Thu, Nov 8, 2018 at 5:06 PM Hameer Abbasi <einstein.edi...@gmail.com 
> > > (mailto:einstein.edi...@gmail.com)> wrote:
> > > > No, Stefan, I’ll do that now. Putting you in the cc.
> > > >
> > > > It slipped my mind among the million other things I had in mind — 
> > > > Namely: My job visa. It was only done this Monday.
> > > >
> > > > Hi, Marten, Stephan:
> > > >
> > > > Stefan wants me to write up a NEP that allows a given object to specify 
> > > > that it is a duck array — Namely, that it follows duck-array semantics.
> > > >
> > > > We were thinking of switching asanyarray to switch to passing through 
> > > > anything that implements the duck-array protocol along with ndarray 
> > > > subclasses. I’m sure this would help XArray and Quantity work better 
> > > > with existing codebases, along with PyData/Sparse arrays.
> > > >
> > > > Would you be interested?
> > > >
> > > > Best Regards,
> > > > Hameer Abbasi
> > > >
> > > >
> > > > > On Thursday, Nov 08, 2018 at 9:09 PM, Stefan van der Walt 
> > > > > <stef...@berkeley.edu (mailto:stef...@berkeley.edu)> wrote:
> > > > > Hi Hameer,
> > > > >
> > > > > In last week's meeting, we had the following in the notes:
> > > > >
> > > > > > Hameer is contacting Marten & Stephan and write up a draft NEP for
> > > > > > clarifying the asarray/asanyarray and matrix/subclass path forward.
> > > > >
> > > > > Did any of that happen that you could share?
> > > > >
> > > > > Thanks and best regards,
> > > > > Stéfan

Hello, everyone,

Me, Stefan van der Walt, Stephan Hoyer and Marten van Kerkwijk were having a 
discussion about the state of matrix, asarray and asanyarray. Our thoughts are 
summarised above (in the quoted text that I’m forwarding)

Basically, this grew out of a discussion relating to asanyarray/asarray 
inconsistencies in NumPy about which to use where. Historically, asarray was 
used in many libraries/places instead of asanyarray usually because np.matrix 
caused problems due to its special behaviour with regard to indexing (it always 
returns a 2-D object when eliminating one dimension, but a 0-D one when 
eliminating both), its behaviour regarding __mul__ (the multiplication operator 
represents matrix multiplication rather than element-wise multiplication) and 
its fixed dimensionality (matrix is 2D only). Because of these three things, as 
Stephan accurately pointed out, it violates the Liskov Substitution Principle.

Because of this behaviour, many libraries switched from using asanyarray to 
asarray, as np.matrix wouldn’t work with their code. This shut out other matrix 
subclasses from being used as well, such as MaskedArray and astropy.Quantity. 
Even if asanyarray is used, there is usually no guarantee that a matrix will be 
returned instead of an array.

The changes I’m proposing are twofold, but simple:
asanyarray should return mat.view(type=np.ndarray) instead of matrices, after 
an appropriate time with a FutureWarning. This allows us to preserve the 
performance (Creating a view is O(1) both in memory and time), and the 
mutability of the original matrix. This change should happen after a 
FutureWarning and the usual grace period.
In the spirit of allowing duck-arrays to work with existing NumPy code, 
asanyarray should be overridable via __array_function__, so that duck arrays 
can decide whether to pass themselves through. If subclasses are allowed, so 
should ducka-arrays as well.

This is a part of a larger effort to deprecate np.matrix. As far as I’m aware, 
it has one big customer (scipy.sparse). The effort to replace that is already 
underway at PyData/Sparse.

Best Regards,
Hameer Abbasi

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to