alamb commented on code in PR #9093:
URL: https://github.com/apache/arrow-rs/pull/9093#discussion_r3021837200
##########
parquet/src/arrow/array_reader/builder.rs:
##########
@@ -96,15 +96,26 @@ pub struct ArrayReaderBuilder<'a> {
parquet_metadata: Option<&'a ParquetMetaData>,
/// metrics
metrics: &'a ArrowReaderMetrics,
+ /// Batch size for pre-allocating internal buffers
+ batch_size: usize,
}
impl<'a> ArrayReaderBuilder<'a> {
- pub fn new(row_groups: &'a dyn RowGroups, metrics: &'a ArrowReaderMetrics)
-> Self {
+ /// Create a new `ArrayReaderBuilder`
+ ///
+ /// `batch_size` is used to pre-allocate internal buffers with the
expected capacity,
+ /// avoiding reallocations when reading the first batch of data.
+ pub fn new(
+ row_groups: &'a dyn RowGroups,
+ metrics: &'a ArrowReaderMetrics,
+ batch_size: usize,
Review Comment:
This is a public API and thus this change is a breaking API change.
Maybe we could avoid changing the API via a new `with` method instead
something like
```rust
let reader = ArrayReaderBuilder::new(row_groups, metrics)
.with_batch_size(batch_size)
```
🤔
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]