ShiKaiWi commented on issue #6430: URL: https://github.com/apache/arrow-rs/issues/6430#issuecomment-2366834789
Thanks a lot for your responses and suggestions. @alamb @Rachelint In conclusion, the two problems and the proposed solutions to them are as follows: **1. Whether to clone the underlying buffer for the unaligned-sliced boolean array** > I think we should make conversion from array --> builder such that they don't copy if possible, but if necessary copy the underlying buffers. A sliced array is likely to be share so copying the bitmap to modify it will be needed anyways >I read codes today, and I found maybe we can use ~~`BooleanArray::sliced`~~`BooleanBuffer::sliced` to solve the offset problem. It may be not so efficient if found buffer not aligned(clone will be triggered), but can work currently. Basically, I agree with using [`BooleanBuffer::sliced`](https://docs.rs/arrow/latest/arrow/buffer/struct.BooleanBuffer.html#method.sliced) to get the underlying buffer, which may introduce clone. However, it makes me worried that in the implementation of [`PrimitiveArray::into_builder`](https://docs.rs/arrow/latest/arrow/array/struct.PrimitiveArray.html#method.into_builder) the error will be thrown if the underlying buffer is found shared, and here is the comment of it: ``` /// Returns a `PrimitiveBuilder` for this array, suitable for mutating values /// in place. /// /// # Buffer Reuse /// /// If the underlying data buffer has no other outstanding references, the /// buffer is used without copying. /// /// If the underlying data buffer does have outstanding references, returns /// `Err(self)` pub fn into_builder(self) -> Result<PrimitiveBuilder<T>, Self> { ... } ``` And if we allow the clone for `BooleanArray::into_builder` the consistency of `into_builder` between `PrimitiveArray::into_builder` and `BooleanArray::into_builder` will be broken, and will it be confusing? Actually, I'm in favor of totally avoiding copying for a not-shared sliced boolean array for performance and api consistency, but some breaking changes in the public api of `BooleanBufferBuilder` will be introduced... So considering the api consistency, shall we insist on allowing clone for sliced `BooleanArray`? **2. Allow direct manipulation of the underling boolean buffer** The proposed solutions are similar -- exposing the underlying two builders. However, the provided two builders are required to be manipulated correctly, which seems not user-friendly, if some bit needs to be set/unset. And I guess it would be better if we add a new method to the `BooleanBuilder`: ```rust pub fn set_bit(&mut self, index: usize, v: Option<bool>) { ... manipulating the `BooleanBufferBuilder` and `NullBufferBuilder`. } ``` So how about this proposal? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
