alamb commented on code in PR #9093:
URL: https://github.com/apache/arrow-rs/pull/9093#discussion_r2679567468


##########
parquet/src/arrow/record_reader/buffer.rs:
##########
@@ -19,6 +19,12 @@ use crate::arrow::buffer::bit_util::iter_set_bits_rev;
 
 /// A buffer that supports padding with nulls
 pub trait ValuesBuffer: Default {

Review Comment:
   Another potential way to use the compiler to help ensure ValuesBuffer is 
always created with a capacity would be to remove this `Default` bound



##########
parquet/src/arrow/array_reader/builder.rs:
##########
@@ -119,6 +122,15 @@ impl<'a> ArrayReaderBuilder<'a> {
         self
     }
 
+    /// Set the batch size for pre-allocating internal buffers
+    ///
+    /// This allows the reader to pre-allocate buffers with the expected 
capacity,
+    /// avoiding reallocations when reading the first batch of data.
+    pub fn with_batch_size(mut self, batch_size: usize) -> Self {

Review Comment:
   Awesome -- this is a good way to test out the idea.
   
   Another thought I had (for the final PR) was to make the batch size 
mandatory (rather than `Option`). While this is likely more code churn, it 
would let us use the compiler to verify that it is always specified



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to