jhorstmann commented on pull request #8598:
URL: https://github.com/apache/arrow/pull/8598#issuecomment-723326224
When I introduced this initially in [ARROW-10040][1] one feedback was that
big endian was not supported yet anyway so it would not be necessary to worry
about that now. I think it could be made to work rather easily by calling
`to_le` in 2-3 places if I had access to a big endian test machine or CI
pipeline.
Adding a dependency that already implements the chunking and remainder logic
is nice. I would have expected that to reduce the code size though.
The `buffer_bit_ops` microbenchmark seems to be affected quite a bit:
```
buffer_bit_ops and time: [1.1393 us 1.1413 us 1.1433 us]
change: [+889.05% +892.72% +896.41%] (p = 0.00 <
0.05)
Performance has regressed.
```
The sum aggregation kernel is another bigger user of the bit slice functions
and also regressed a bit:
```
sum nulls 512 time: [305.83 ns 306.31 ns 306.82 ns]
change: [+25.194% +25.552% +25.936%] (p = 0.00 <
0.05)
Performance has regressed.
```
Most benchmarks don't seem to be affected much, probably because there is
some other overhead or they are not using the chunked functions. Cast kernels
for example are implemented using iterators of optional values and so use a
different code path.
[1]: https://github.com/apache/arrow/pull/8262
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]