adriangb commented on code in PR #9066:
URL: https://github.com/apache/arrow-rs/pull/9066#discussion_r2679883919


##########
parquet/src/arrow/array_reader/cached_array_reader.rs:
##########
@@ -152,9 +168,17 @@ impl CachedArrayReader {
 
         let array = self.inner.consume_batch()?;
 
-        // Capture definition and repetition levels from inner reader before 
they are cleared
-        let def_levels = self.inner.get_def_levels().map(|l| l.to_vec());
-        let rep_levels = self.inner.get_rep_levels().map(|l| l.to_vec());
+        // Capture definition levels from inner reader only when needed 
(def_level > 0).
+        // This avoids unnecessary allocations for columns without nullable 
ancestors.
+        // Repetition levels are never copied because caching is only enabled 
for
+        // columns with rep_level == 0 (non-nested, non-repeated columns).
+        let def_levels = if self.needs_def_levels {
+            self.inner.get_def_levels().map(|l| l.to_vec())
+        } else {
+            None
+        };

Review Comment:
   @alamb this was the best I could easily do: `get_def_levels()` returns an 
`Option<&[u16]>` and we're storing this on structures that can't easily have a 
lifetime (without a major refactor at least) so we have to allocate an owned 
`Vec`



##########
parquet/src/arrow/array_reader/builder.rs:
##########
@@ -162,6 +162,7 @@ impl<'a> ArrayReaderBuilder<'a> {
                         col_idx,
                         cache_options.role,
                         self.metrics.clone(), // cheap clone
+                        field.def_level > 0,  // needs_def_levels: true if has 
nullable ancestors

Review Comment:
   Alternatively could pass `field: &ParquetField` or `def_level: i16` and have 
`CachedArrayReader::new` figure out how to use that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to