pitrou commented on PR #46732:
URL: https://github.com/apache/arrow/pull/46732#issuecomment-2959468941

   > or when using the `indices` list on `take`?
   
   Yes, this one.
   
   > this would still have to convert from the Numpy array to an Arrow array on 
the previous case, right?
   
   `np.arange` is quick and Numpy to Arrow is zero-copy.
   
   > Is Pylist to Arrow array that much slow than from Numpy array to Arrow 
array?
   
   Extremely slower as you have to convert generic Python objects to a 
contiguous native array.
   
   ```python
   >>> start, stop, step = 1, 1_000_000, 2
   
   >>> %timeit np.arange(start, stop, step)
   115 μs ± 741 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
   >>> %timeit pa.array(np.arange(start, stop, step))
   120 μs ± 479 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
   
   >>> %timeit list(range(start, stop, step))
   13.1 ms ± 84.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   >>> %timeit pa.array(list(range(start, stop, step)))
   32.9 ms ± 56.9 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
   ```
   
   And then:
   ```python
   >>> a = pa.array(np.arange(0, 2_000_000))
   >>> %timeit a.take(np.arange(start, stop, step))
   818 μs ± 1.86 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
   >>> %timeit a.take(list(range(start, stop, step)))
   33 ms ± 101 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to