scovich commented on PR #9120: URL: https://github.com/apache/arrow-rs/pull/9120#issuecomment-3739206001
> `BooleanBufferBuilder::finish` takes `&mut self` and must allocate internal replacements for the consumed state, while `From` (as always) consumes `self`. Maybe it's worth documenting that pitfall on the `finish` method? I didn't even realize that `impl From` was a cheap/final alternative to `finish`, but it does make sense in retrospect. Maybe it's worth considering an alternative approach to the builder construction/finish protocol? Instead of allocating capacity on creation (and re-allocating on finish), allocate on first touch. Every append to a builder anyway has to do capacity checks, so the compiler should inline and optimize the branching so that the first-touch check is basically free in the common case where the allocation is already large enough? But looking at the code, `MutableBuffer::new(0)` calls [MutableBuffer::with_capacity](https://docs.rs/arrow-buffer/57.2.0/src/arrow_buffer/buffer/mutable.rs.html#116) which doesn't actually allocate any memory? Is the mutex allocation somehow expensive? ```rust pub fn with_capacity(capacity: usize) -> Self { let capacity = bit_util::round_upto_multiple_of_64(capacity); let layout = Layout::from_size_align(capacity, ALIGNMENT) .expect("failed to create layout for MutableBuffer"); let data = match layout.size() { 0 => dangling_ptr(), _ => { // Safety: Verified size != 0 let raw_ptr = unsafe { std::alloc::alloc(layout) }; NonNull::new(raw_ptr).unwrap_or_else(|| handle_alloc_error(layout)) } }; Self { data, len: 0, layout, #[cfg(feature = "pool")] reservation: std::sync::Mutex::new(None), } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
