jhorstmann commented on pull request #8598: URL: https://github.com/apache/arrow/pull/8598#issuecomment-723326224
When I introduced this initially in [ARROW-10040][1] one feedback was that big endian was not supported yet anyway so it would not be necessary to worry about that now. I think it could be made to work rather easily by calling `to_le` in 2-3 places if I had access to a big endian test machine or CI pipeline. Adding a dependency that already implements the chunking and remainder logic is nice. I would have expected that to reduce the code size though. The `buffer_bit_ops` microbenchmark seems to be affected quite a bit: ``` buffer_bit_ops and time: [1.1393 us 1.1413 us 1.1433 us] change: [+889.05% +892.72% +896.41%] (p = 0.00 < 0.05) Performance has regressed. ``` The sum aggregation kernel is another bigger user of the bit slice functions and also regressed a bit: ``` sum nulls 512 time: [305.83 ns 306.31 ns 306.82 ns] change: [+25.194% +25.552% +25.936%] (p = 0.00 < 0.05) Performance has regressed. ``` Most benchmarks don't seem to be affected much, probably because there is some other overhead or they are not using the chunked functions. Cast kernels for example are implemented using iterators of optional values and so use a different code path. [1]: https://github.com/apache/arrow/pull/8262 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org