kawadakk opened a new issue, #4549:
URL: https://github.com/apache/arrow-rs/issues/4549

   **Describe the bug**
   `FixedSizeListBuilder::new` allocates extraneous capacity for the validity 
buffer.
   
   
https://github.com/apache/arrow-rs/blob/72cafde586af831d911473c6d1bbd56d2482cfdb/arrow-array/src/builder/fixed_size_list_builder.rs#L73-L78
   
   Line 77 specifies to pre-allocate `capacity` (`values_builder.len()`) 
elements when in fact only `values_builder.len() / value_length` elements are 
necessary in the validity buffer to cover all existing values in 
`values_builder`.
   
   **To Reproduce**
   
   ```rust
   use std::{
       alloc::GlobalAlloc,
       sync::atomic::{AtomicUsize, Ordering},
   };
   
   fn main() {
       let mut el_builder = arrow::array::UInt8Builder::with_capacity(1024 * 
1024);
       for _ in 0..1024 * 1024 {
           el_builder.append_value(0);
       }
   
       MAX.store(0, Ordering::Relaxed); // ignore the allocation for 
`UInt8Builder`
   
       let mut builder = arrow::array::FixedSizeListBuilder::new(el_builder, 
1024);
       for _ in 0..1024 * 1024 / 1024 {
           builder.append(false);
       }
   
       // Check the allocation size of `FixedSizeListBuilder` validity buffer
       assert!(dbg!(MAX.load(Ordering::Relaxed)) <= 1024 * 1024 / 1024 / 8 + 
arrow::alloc::ALIGNMENT);
   }
   
   struct Alloc(std::alloc::System);
   
   #[global_allocator]
   static _A: Alloc = Alloc(std::alloc::System);
   
   static MAX: AtomicUsize = AtomicUsize::new(0);
   
   unsafe impl GlobalAlloc for Alloc {
       unsafe fn alloc(&self, layout: std::alloc::Layout) -> *mut u8 {
           // Remember largest allocation
           MAX.fetch_max(dbg!(layout.size()), Ordering::Relaxed);
           self.0.alloc(layout)
       }
   
       unsafe fn dealloc(&self, ptr: *mut u8, layout: std::alloc::Layout) {
           self.0.dealloc(ptr, layout)
       }
   }
   ```
   
   **Expected behavior**
   <!--
   A clear and concise description of what you expected to happen.
   -->
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to