ShiKaiWi commented on issue #6430:
URL: https://github.com/apache/arrow-rs/issues/6430#issuecomment-2366834789

   Thanks a lot for your responses and suggestions. @alamb @Rachelint 
   
   In conclusion, the two problems and the proposed solutions to them are as 
follows:
   
   **1. Whether to clone the underlying buffer for the unaligned-sliced boolean 
array**
   
   > I think we should make conversion from array --> builder such that they 
don't copy if possible, but if necessary copy the underlying buffers. A sliced 
array is likely to be share so copying the bitmap to modify it will be needed 
anyways
   
   >I read codes today, and I found maybe we can use 
~~`BooleanArray::sliced`~~`BooleanBuffer::sliced` to solve the offset problem.
   It may be not so efficient if found buffer not aligned(clone will be 
triggered), but can work currently.
   
   Basically, I agree with using 
[`BooleanBuffer::sliced`](https://docs.rs/arrow/latest/arrow/buffer/struct.BooleanBuffer.html#method.sliced)
 to get the underlying buffer, which may introduce clone.
   
   However, it makes me worried that in the implementation of 
[`PrimitiveArray::into_builder`](https://docs.rs/arrow/latest/arrow/array/struct.PrimitiveArray.html#method.into_builder)
 the error will be thrown if the underlying buffer is found shared, and here is 
the comment of it:
   ```
       /// Returns a `PrimitiveBuilder` for this array, suitable for mutating 
values
       /// in place.
       ///
       /// # Buffer Reuse
       ///
       /// If the underlying data buffer has no other outstanding references, 
the
       /// buffer is used without copying.
       ///
       /// If the underlying data buffer does have outstanding references, 
returns
       /// `Err(self)`
       pub fn into_builder(self) -> Result<PrimitiveBuilder<T>, Self> {
       ...
       }
   ```
   And if we allow the clone for `BooleanArray::into_builder` the consistency 
of `into_builder` between `PrimitiveArray::into_builder` and 
`BooleanArray::into_builder` will be broken, and will it be confusing?
   
   Actually, I'm in favor of totally avoiding copying for a not-shared sliced 
boolean array for performance and api consistency, but some breaking changes in 
the public api of `BooleanBufferBuilder` will be introduced...
   
   So considering the api consistency, shall we insist on allowing clone for 
sliced `BooleanArray`? 
   
   **2. Allow direct manipulation of the underling boolean buffer**
   
   The proposed solutions are similar -- exposing the underlying two builders. 
However, the provided two builders are required to be manipulated correctly, 
which seems not user-friendly, if some bit needs to be set/unset. And I guess 
it would be better if we add a new method to the `BooleanBuilder`:
   ```rust
       pub fn set_bit(&mut self, index: usize, v: Option<bool>) {
       ... manipulating the `BooleanBufferBuilder` and `NullBufferBuilder`.
       }
   ```
   
   So how about this proposal?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to