paleolimbot opened a new pull request, #426:
URL: https://github.com/apache/arrow-nanoarrow/pull/426

   This PR tweaks the implementation of packing an iterable into a buffer to 
avoid the very bad performance that existed previously. The optimizations added 
were:
   
   - The `CBufferBuilder` now implements the buffer protocol (so that we can 
use `pack_into`)
   - The `__len__` attribute is checked to preallocate where possible
   
   Those optimizations resulted in a ~2x improvement over the previous code; 
however, the types that can use the `array` constructor have the biggest wins 
(5-6x improvement).
   
   An example with the biggest gain:
   
   ```python
   import numpy as np
   import nanoarrow as na
   import pyarrow as pa
   
   floats = np.random.random(int(1e6))
   floats_lst = list(floats)
   
   %timeit pa.array(floats, pa.float64())
   #> 1.79 µs ± 9.27 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops 
each)
   %timeit pa.array(floats_lst, pa.float64())
   #> 13.8 ms ± 35.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   %timeit pa.array(iter(floats_lst), pa.float64())
   #> 17.9 ms ± 37.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   
   %timeit na.c_array(floats, na.float64())
   #> 5.51 µs ± 25.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops 
each)
   %timeit na.c_array(floats_lst, na.float64())
   #> 16.5 ms ± 41.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   %timeit na.c_array(iter(floats_lst), na.float64())
   #> 29.1 ms ± 254 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
   ```
   
   Before this PR:
   
   ```python
   %timeit na.c_array(floats, na.float64())
   #> 5.66 µs ± 44.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops 
each)
   %timeit na.c_array(floats_lst, na.float64())
   #> 104 ms ± 187 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
   %timeit na.c_array(iter(floats_lst), na.float64())
   #> 107 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to