jhorstmann opened a new issue, #8561:
URL: https://github.com/apache/arrow-rs/issues/8561

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   `BooleanBufferBuilder` / `BooleanBuilder` seem to currently serve two quite 
distinct usecases, which makes it more difficult to get the best possible 
performance for both of them.
   
    - Building a new buffer by starting from an empty state and incrementally 
appending new bits (`append_value`, `append_slice`, `append_packed_range` and 
similar methods).
    - Starting from a buffer that is initialized to ones/zeroes or a copy of an 
existing buffer, modifying certain bits in it (`set_bit`, `get_bit`).
   
   (This is based my analysis of the arrow-rs code only, the assumption should 
also be verified against some bigger users like datafusion.)
   
   The first usecase can be optimized by collecting bits to append in a `u64` 
and only appending the corresponding bytes to memory every 64 appended values. 
Any capacity checks are thus amortized over those 64 values. On the other hand, 
methods to get and set arbitrary bit positions would be a bit less efficient to 
implement in this scheme.
   
   The second usecase should not need any logic to resize the buffer and so 
could be much simpler.
   
   **Describe the solution you'd like**
   
   I think it would make sense to also separate these usecases in code, by 
introducing separate `BooleanBufferBuilder` and `MutableBooleanBuffer` 
implementations.
   
   Since this would involve breaking the existing api, it would probably have 
to be done in multiple steps. First introducing a separate 
`MutableBooleanBuffer` and deprecating the `set_bit` / `get_bit` functionality, 
and only afterwards refactoring the `BooleanBuilder`.
   
   **Describe alternatives you've considered**
   <!--
   A clear and concise description of any alternative solutions or features 
you've considered.
   -->
   
   **Additional context**
   
   Noticed while looking into #8543
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to