paleolimbot opened a new pull request, #426: URL: https://github.com/apache/arrow-nanoarrow/pull/426
This PR tweaks the implementation of packing an iterable into a buffer to avoid the very bad performance that existed previously. The optimizations added were: - The `CBufferBuilder` now implements the buffer protocol (so that we can use `pack_into`) - The `__len__` attribute is checked to preallocate where possible Those optimizations resulted in a ~2x improvement over the previous code; however, the types that can use the `array` constructor have the biggest wins (5-6x improvement). An example with the biggest gain: ```python import numpy as np import nanoarrow as na import pyarrow as pa floats = np.random.random(int(1e6)) floats_lst = list(floats) %timeit pa.array(floats, pa.float64()) #> 1.79 µs ± 9.27 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) %timeit pa.array(floats_lst, pa.float64()) #> 13.8 ms ± 35.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %timeit pa.array(iter(floats_lst), pa.float64()) #> 17.9 ms ± 37.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %timeit na.c_array(floats, na.float64()) #> 5.51 µs ± 25.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) %timeit na.c_array(floats_lst, na.float64()) #> 16.5 ms ± 41.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %timeit na.c_array(iter(floats_lst), na.float64()) #> 29.1 ms ± 254 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` Before this PR: ```python %timeit na.c_array(floats, na.float64()) #> 5.66 µs ± 44.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) %timeit na.c_array(floats_lst, na.float64()) #> 104 ms ± 187 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit na.c_array(iter(floats_lst), na.float64()) #> 107 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org