On Mon, Dec 30, 2024 at 1:51 PM Robert Kern <robert.k...@gmail.com> wrote:
> On Mon, Dec 30, 2024 at 10:28 AM Mark Harfouche via NumPy-Discussion < > numpy-discussion@python.org> wrote: > >> Happy new year everybody! >> >> I've been upgrading my code to start to support array indexing and in my >> tests I found something that was well documented, but surprising to me. >> >> I've tried to read through >> https://numpy.org/doc/stable/user/basics.indexing.html#combining-advanced-and-basic-indexing >> and even after multiple passes, I still find it very terse... >> >> Consider a mutli dimensional dataset: >> >> import numpy as np >> shape = (10, 20, 30) >> original = np.arange(np.prod(shape)).reshape(shape) >> >> Let's consider we want to collapse dim 0 to a single entry >> Let's consider we want a subset from dim 1, with a slice >> Let's consider that we want want 3 elements from dim 2 >> >> i = 2 >> j = slice(1, 6) >> k = slice(7, 10) >> out_basic = original[i, j, k] >> assert out_basic.shape == (5, 3) >> >> Now consider we want to provide freedom to have instead of a slice for k, >> an arbitrary "array" >> >> k = [7, 11, 13] >> out_array = original[i, j, k] >> assert out_array.shape == (5, 3), f"shape is actually {out_array.shape}" >> >> AssertionError: shape is actually (3, 5) >> > > >> This is somewhat very surprising to me. I totally understand that things >> won't change in terms of this kind of indexing in numpy, but is there a way >> I can adjust my indexing strategy to regain the ability to slice into my >> array in a "single shot". >> >> The main usecase is for arrays that are truly huge, but chucked in ways >> where slicing into them can be quite efficient. This multi-dimensional >> imaging data. Each chunk is quite "huge" so this kind of metadata >> manipulation is worthwhile to avoid unecessary IO. >> >> Perhaps there is a "simple" distinction I am missing, for example using a >> tuple for k instead of a list???? >> > > No, there's no simple solution that you're missing. The kind of indexing > that you want has been considered in NEP 21 (which called it "orthogonal > indexing"), which saw some progress, but has largely been left fallow. > There's been some movement on the Array API specification side, so that > might spur some movement. > > https://numpy.org/neps/nep-0021-advanced-indexing.html > > Jaime Rio made a pure Python implementation that might work for you > (though I'm not sure about the performance for large arrays with big > slices), but it's buried in a close PR (still works, though): > > https://github.com/numpy/numpy/pull/5749/files > > >>> original_oindex = OrthogonalIndexer(original) > >>> original_oindex[i, j, k] > array([[1237, 1241, 1243], > [1267, 1271, 1273], > [1297, 1301, 1303], > [1327, 1331, 1333], > [1357, 1361, 1363]]) > > Note that some other array implementations like Xarray and Zarr provide > the `.oindex` property which does the orthogonal indexing semantics > (roughly) per NEP 21, so those might be options for you. > > -- > Robert Kern > Thank you Robert, I will have a read through NEP21. It seems like it discusses many aspects that I ran into.
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com