Re: [Numpy-discussion] ndrange, like range but multidimensiontal

Stephan Hoyer Tue, 09 Oct 2018 13:59:23 -0700

The speed difference is interesting but really a different question than
the public API.


I'm coming around to ndrange(). I can see how it could be useful for
symbolic manipulation of arrays and indexing operations, similar to what we
do in dask and xarray.

On Mon, Oct 8, 2018 at 4:25 PM Mark Harfouche <[email protected]>
wrote:

> since ndrange is a superset of the features of ndindex, we can implement
> ndindex with ndrange or keep it as is.
> ndindex is now a glorified `nditer` object anyway. So it isn't so much of
> a maintenance burden.
> As for how ndindex is implemented, I'm a little worried about python 2
> performance seeing as range is a list.
> I would wait on changing the way ndindex is implemented for now.
>
> I agree with Stephan that ndindex should be kept in. Many want backward
> compatible code. It would be hard for me to justify why a dependency should
> be bumped up to bleeding edge numpy just for a convenience iterator.
>
> Honestly, I was really surprised to see such a speed difference, I thought
> it would have been closer.
>
> Allan, I decided to run a few more benchmarks, the nditer just seems slow
> for single array access some reason. Maybe a bug?
>
> ```
> import numpy as np
> import itertools
> a = np.ones((1000, 1000))
>
> b = {}
> for i in np.ndindex(a.shape):
>     b[i] = i
>
> %%timeit
> # op_flag=('readonly',) doesn't change performance
> for a_value in np.nditer(a):
>     pass
> 109 ms ± 921 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>
> %%timeit
> for i in itertools.product(range(1000), range(1000)):
>     a_value = a[i]
> 113 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>
> %%timeit
> for i in itertools.product(range(1000), range(1000)):
>     c = b[i]
> 193 ms ± 3.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>
> %%timeit
> for a_value in a.flat:
>     pass
> 25.3 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>
> %%timeit
> for k, v in b.items():
>     pass
> 19.9 ms ± 675 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>
> %%timeit
> for i in itertools.product(range(1000), range(1000)):
>     pass
> 28 ms ± 715 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
> ```
>
> On Mon, Oct 8, 2018 at 4:26 PM Stephan Hoyer <[email protected]> wrote:
>
>> I'm open to adding ndrange, and "soft-deprecating" ndindex (i.e.,
>> discouraging its use in our docs, but not actually deprecating it).
>> Certainly ndrange seems like a small but meaningful improvement in the
>> interface.
>>
>> That said, I'm not convinced this is really worth the trouble. I think
>> the nested loop is still pretty readable/clear, and there are few times
>> when I've actually found ndindex() be useful.
>>
>> On Mon, Oct 8, 2018 at 12:35 PM Allan Haldane <[email protected]>
>> wrote:
>>
>>> On 10/8/18 12:21 PM, Mark Harfouche wrote:
>>> > 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like
>>> > `range`, is not an iterator. Changing this behaviour would likely lead
>>> > to breaking code that uses that assumption. For example anybody using
>>> > introspection or code like:
>>> >
>>> > ```
>>> > indx = np.ndindex(5, 5)
>>> > next(indx)  # Don't look at the (0, 0) coordinate
>>> > for i in indx:
>>> >     print(i)
>>> > ```
>>> > would break if `ndindex` becomes "not an iterator"
>>>
>>> OK, I see now. Just like python3 has separate range and range_iterator
>>> types, where range is sliceable, we would have separate ndrange and
>>> ndindex types, where ndrange is sliceable. You're just copying the
>>> python3 api. That justifies it pretty well for me.
>>>
>>> I still think we shouldn't have two functions which do nearly the same
>>> thing. We should only have one, and get rid of the other. I see two ways
>>> forward:
>>>
>>>  * replace ndindex by your ndrange code, so it is no longer an iter.
>>>    This would require some deprecation cycles for the cases that break.
>>>  * deprecate ndindex in favor of a new function ndrange. We would keep
>>>    ndindex around for back-compatibility, with a dep warning to use
>>>    ndrange instead.
>>>
>>> Doing a code search on github, I can see that a lot of people's code
>>> would break if ndindex no longer was an iter. I also like the name
>>> ndrange for its allusion to python3's range behavior. That makes me lean
>>> towards the second option of a separate ndrange, with possible
>>> deprecation of ndindex.
>>>
>>> > itertools.product + range seems to be much faster than the current
>>> > implementation of ndindex
>>> >
>>> > (python 3.6)
>>> > ```
>>> > %%timeit
>>> >
>>> > for i in np.ndindex(100, 100):
>>> >     pass
>>> > 3.94 ms ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> >
>>> > %%timeit
>>> > import itertools
>>> > for i in itertools.product(range(100), range(100)):
>>> >     pass
>>> > 231 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> > ```
>>>
>>> If the new code ends up faster than the old code, that's great, and
>>> further justification for using ndrange instead of ndindex. I had
>>> thought using nditer in the old code was fastest.
>>>
>>> So as far as I am concerned, I say go ahead with the PR the way you are
>>> doing it.
>>>
>>> Allan
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> [email protected]
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ndrange, like range but multidimensiontal

Reply via email to