pitrou commented on issue #40874:
URL: https://github.com/apache/arrow/issues/40874#issuecomment-2025564079

   That said, even with mimalloc we're much slower that Numpy on larger arrays, 
so there's perhaps another issue (are we allocating a null bitmap?):
   ```python
   >>> arr = pa.array([42]*100_000_000, type=pa.int64())
   >>> np_arr = arr.to_numpy()
   
   >>> %timeit arr.cast(pa.float64(), safe=False, 
memory_pool=pa.mimalloc_memory_pool())
   277 ms ± 636 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
   >>> %timeit arr[:10_000_000].cast(pa.float64(), safe=False, 
memory_pool=pa.mimalloc_memory_pool())
   27.6 ms ± 448 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
   >>> %timeit arr[:1_000_000].cast(pa.float64(), safe=False, 
memory_pool=pa.mimalloc_memory_pool())
   557 µs ± 2.43 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
   
   >>> %timeit np_arr.astype('float64')
   126 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
   >>> %timeit np_arr[:10_000_000].astype('float64')
   12.6 ms ± 45.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   >>> %timeit np_arr[:1_000_000].astype('float64')
   466 µs ± 4.85 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to