[Numpy-discussion] Re: surprising behavior from array indexing

Robert Kern via NumPy-Discussion Mon, 30 Dec 2024 10:53:12 -0800

On Mon, Dec 30, 2024 at 10:28 AM Mark Harfouche via NumPy-Discussion <
numpy-discussion@python.org> wrote:


> Happy new year everybody!
>
> I've been upgrading my code to start to support array indexing and in my
> tests I found something that was well documented, but surprising to me.
>
> I've tried to read through
> https://numpy.org/doc/stable/user/basics.indexing.html#combining-advanced-and-basic-indexing
> and even after multiple passes, I still find it very terse...
>
> Consider a mutli dimensional dataset:
>
> import numpy as np
> shape = (10, 20, 30)
> original = np.arange(np.prod(shape)).reshape(shape)
>
> Let's consider we want to collapse dim 0 to a single entry
> Let's consider we want a subset from dim 1, with a slice
> Let's consider that we want want 3 elements from dim 2
>
> i = 2
> j = slice(1, 6)
> k = slice(7, 10)
> out_basic = original[i, j, k]
> assert out_basic.shape == (5, 3)
>
> Now consider we want to provide freedom to have instead of a slice for k,
> an arbitrary "array"
>
> k = [7, 11, 13]
> out_array = original[i, j, k]
> assert out_array.shape == (5, 3), f"shape is actually {out_array.shape}"
>
> AssertionError: shape is actually (3, 5)
>


> This is somewhat very surprising to me. I totally understand that things
> won't change in terms of this kind of indexing in numpy, but is there a way
> I can adjust my indexing strategy to regain the ability to slice into my
> array in a "single shot".
>
> The main usecase is for arrays that are truly huge, but chucked in ways
> where slicing into them can be quite efficient. This multi-dimensional
> imaging data. Each chunk is quite "huge" so this kind of metadata
> manipulation is worthwhile to avoid unecessary IO.
>
> Perhaps there is a "simple" distinction I am missing, for example using a
> tuple for k instead of a list????
>

No, there's no simple solution that you're missing. The kind of indexing
that you want has been considered in NEP 21 (which called it "orthogonal
indexing"), which saw some progress, but has largely been left fallow.
There's been some movement on the Array API specification side, so that
might spur some movement.

https://numpy.org/neps/nep-0021-advanced-indexing.html

Jaime Rio made a pure Python implementation that might work for you (though
I'm not sure about the performance for large arrays with big slices), but
it's buried in a close PR (still works, though):

https://github.com/numpy/numpy/pull/5749/files

>>> original_oindex = OrthogonalIndexer(original)
>>> original_oindex[i, j, k]
array([[1237, 1241, 1243],
       [1267, 1271, 1273],
       [1297, 1301, 1303],
       [1327, 1331, 1333],
       [1357, 1361, 1363]])

Note that some other array implementations like Xarray and Zarr provide the
`.oindex` property which does the orthogonal indexing semantics (roughly)
per NEP 21, so those might be options for you.

-- 
Robert Kern

_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: surprising behavior from array indexing

Reply via email to