paleolimbot opened a new pull request, #450:
URL: https://github.com/apache/arrow-nanoarrow/pull/450
In prototyping a real-world use case, I remembered that unpacking bits is
exceedingly difficult to get right if you need to support an arbitrary
offset/length. The math for this is very fiddly and we spent a few rounds
getting it right in the C function `ArrowBitsUnpackInt(8|32)`. This PR makes
that available so that we can do things like (1) convert bool arrays to numpy
and (2) convert null masks to something that somebody else can work with (e.g.,
a numpy mask).
This seems to be relatively performant (thanks to @WillAyd's work optimizing
this!)
```python
import numpy as np
import nanoarrow as na
import pyarrow as pa
bool_np = np.random.random(int(1e6)) > 0.5
bool_na = na.Array(iter(bool_array), na.bool_())
bool_pa = pa.array(bool_np)
def to_numpy_na(x):
x_view = na.c_array(x).view()
out = np.empty(x_view.length, bool)
x_view.buffer(1).unpack_bits_into(out, x_view.offset, x_view.length)
return out
%timeit to_numpy_na(bool_na)
#> 162 µs ± 812 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit bool_pa.to_numpy(False)
#> 609 µs ± 833 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]