paleolimbot opened a new pull request, #450:
URL: https://github.com/apache/arrow-nanoarrow/pull/450

   In prototyping a real-world use case, I remembered that unpacking bits is 
exceedingly difficult to get right if you need to support an arbitrary 
offset/length. The math for this is very fiddly and we spent a few rounds 
getting it right in the C function `ArrowBitsUnpackInt(8|32)`. This PR makes 
that available so that we can do things like (1) convert bool arrays to numpy 
and (2) convert null masks to something that somebody else can work with (e.g., 
a numpy mask).
   
   This seems to be relatively performant (thanks to @WillAyd's work optimizing 
this!)
   
   ```python
   import numpy as np
   import nanoarrow as na
   import pyarrow as pa
   
   bool_np = np.random.random(int(1e6)) > 0.5
   bool_na = na.Array(iter(bool_array), na.bool_())
   bool_pa = pa.array(bool_np)
   
   def to_numpy_na(x):
       x_view = na.c_array(x).view()
       out = np.empty(x_view.length, bool)
       x_view.buffer(1).unpack_bits_into(out, x_view.offset, x_view.length)
       return out
   
   %timeit to_numpy_na(bool_na)
   #> 162 µs ± 812 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
   
   %timeit bool_pa.to_numpy(False)
   #> 609 µs ± 833 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to