alamb commented on code in PR #9848:
URL: https://github.com/apache/arrow-rs/pull/9848#discussion_r3290935306


##########
parquet/src/arrow/array_reader/builder.rs:
##########
@@ -230,12 +234,17 @@ impl<'a> ArrayReaderBuilder<'a> {
         &self,
         field: &ParquetField,

Review Comment:
   Since we have to change all the callsites anyways to plumb a new argument 
down, what do you think about making a struct to make it easier in the future 
(and give us a clear place to document what `padding_threshold` means?
   
   It might also make it easier to find the callsite that padding_threshold was 
added
   
   Something like
   
   ```rust
   struct ReaderArgs<'a> {
     /// What field the output corresponds to 
     field: &ParquetField,
     /// Which columns are being read
     mask: &ProjectionMask, 
     /// To be documented here
     padding_threshold: Option<i16>,
   }
   ```
   
   If you like this idea, we could try and do it as a preparatory / mechanical 
PR and then this one would be a lot smaller
   
   Then instead of
   ```rust
                       self.build_list_reader(field, mask, padding_threshold)
   ```
   
   It woudl end up like
   
   ```rust
                       self.build_list_reader(reader_args)
   ```
   
   



##########
parquet/src/arrow/array_reader/builder.rs:
##########
@@ -113,9 +113,7 @@ impl<'a> ArrayReaderBuilder<'a> {
         }
     }
 
-    /// Set the batch size used to pre-allocate internal buffers.
-    ///
-    /// This avoids reallocations when reading the first batch of data.

Review Comment:
   any reason to remove this comment?



##########
parquet/src/arrow/array_reader/byte_view_array.rs:
##########
@@ -36,14 +36,12 @@ use bytes::Bytes;
 use std::any::Any;
 
 /// Returns an [`ArrayReader`] that decodes the provided byte array column to 
view types.
-///
-/// `batch_size` is used to pre-allocate internal buffers,

Review Comment:
   why remove the comments?



##########
parquet/src/arrow/array_reader/builder.rs:
##########
@@ -230,12 +234,17 @@ impl<'a> ArrayReaderBuilder<'a> {
         &self,
         field: &ParquetField,

Review Comment:
   maybe also it needs `batch_size`



##########
parquet/src/arrow/array_reader/builder.rs:
##########
@@ -230,12 +234,17 @@ impl<'a> ArrayReaderBuilder<'a> {
         &self,
         field: &ParquetField,
         mask: &ProjectionMask,
+        padding_threshold: Option<i16>,
     ) -> Result<Option<Box<dyn ArrayReader>>> {
         let children = field.children().unwrap();
         assert_eq!(children.len(), 2);
 
-        let key_reader = self.build_reader(&children[0], mask)?;
-        let value_reader = self.build_reader(&children[1], mask)?;
+        // The map is internally List(Struct(key, value)). The key/value
+        // readers are children of the struct inside the list, so their
+        // padding threshold is the map's (= list's) def_level.
+        let child_threshold = Some(field.def_level);

Review Comment:
   With a `ReaderArgs` structure I can imagine this looks like
   
   
   ```rust
       fn build_map_reader(args: ReaderArgs<'_>) -> Result<Option<Box<dyn 
ArrayReader>>> {
         ... 
         // The map is internally List(Struct(key, value)). The key/value
         // readers are children of the struct inside the list, so their
         // padding threshold is the map's (= list's) def_level.
         let args = args.with_padding_threshold(Some(field.def_level));
         
         let key_reader = self.build_reader(args.with_field(&children[0]));
         let value_reader = self.build_reader(args.with_field(&children[1]));
   ```
   
   Though maybe this is too magical 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to