jhorstmann opened a new issue, #8561:
URL: https://github.com/apache/arrow-rs/issues/8561
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
`BooleanBufferBuilder` / `BooleanBuilder` seem to currently serve two quite
distinct usecases, which makes it more difficult to get the best possible
performance for both of them.
- Building a new buffer by starting from an empty state and incrementally
appending new bits (`append_value`, `append_slice`, `append_packed_range` and
similar methods).
- Starting from a buffer that is initialized to ones/zeroes or a copy of an
existing buffer, modifying certain bits in it (`set_bit`, `get_bit`).
(This is based my analysis of the arrow-rs code only, the assumption should
also be verified against some bigger users like datafusion.)
The first usecase can be optimized by collecting bits to append in a `u64`
and only appending the corresponding bytes to memory every 64 appended values.
Any capacity checks are thus amortized over those 64 values. On the other hand,
methods to get and set arbitrary bit positions would be a bit less efficient to
implement in this scheme.
The second usecase should not need any logic to resize the buffer and so
could be much simpler.
**Describe the solution you'd like**
I think it would make sense to also separate these usecases in code, by
introducing separate `BooleanBufferBuilder` and `MutableBooleanBuffer`
implementations.
Since this would involve breaking the existing api, it would probably have
to be done in multiple steps. First introducing a separate
`MutableBooleanBuffer` and deprecating the `set_bit` / `get_bit` functionality,
and only afterwards refactoring the `BooleanBuilder`.
**Describe alternatives you've considered**
<!--
A clear and concise description of any alternative solutions or features
you've considered.
-->
**Additional context**
Noticed while looking into #8543
<!--
Add any other context or screenshots about the feature request here.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]