alamb commented on code in PR #2452:
URL: https://github.com/apache/arrow-rs/pull/2452#discussion_r945907800


##########
parquet/src/arrow/arrow_reader/mod.rs:
##########
@@ -400,22 +400,46 @@ impl<T: ChunkReader + 'static> 
ArrowReaderBuilder<SyncReader<T>> {
         Self::new_builder(SyncReader(reader), metadata, options)
     }
 
+    /// Build a [`ParquetRecordBatchReader`]
+    ///
+    /// Note: this will eagerly evaluate any [`RowFilter`] before returning
     pub fn build(self) -> Result<ParquetRecordBatchReader> {
         let reader =
             FileReaderRowGroupCollection::new(Arc::new(self.input.0), 
self.row_groups);
+
+        let mut filter = self.filter;
+        let mut selection = self.selection;
+
+        if let Some(filter) = filter.as_mut() {

Review Comment:
   I don't understand the `mut` here -- it seems off to me because I wouldn't 
expect that the filter / predicate contains state that could be modified. Maybe 
I misunderstand something



##########
parquet/src/arrow/async_reader.rs:
##########
@@ -283,9 +283,6 @@ where
         batch_size: usize,
     ) -> ReadResult<T> {
         // TODO: calling build_array multiple times is wasteful

Review Comment:
   is this TODO still valid?



##########
parquet/src/arrow/arrow_reader/mod.rs:
##########
@@ -1047,8 +1076,29 @@ mod tests {
         enabled_statistics: EnabledStatistics,
         /// Encoding
         encoding: Encoding,
-        //row selections and total selected row count
+        /// row selections and total selected row count
         row_selections: Option<(RowSelection, usize)>,
+        /// row filter
+        row_filter: Option<Vec<bool>>,
+    }
+
+    impl std::fmt::Debug for TestOptions {

Review Comment:
   ```suggestion
       // Manually implement this, to avoid printing entire contents of 
row_selections and row_filter
       impl std::fmt::Debug for TestOptions {
   ```



##########
parquet/src/arrow/arrow_reader/mod.rs:
##########
@@ -1121,6 +1174,19 @@ mod tests {
             }
         }
 
+        fn with_row_filter(self) -> Self {

Review Comment:
   TIL "chuffed": https://www.merriam-webster.com/dictionary/chuffed (not 
common in American usage 😆 )



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to