Ted-Jiang commented on code in PR #3780:
URL: https://github.com/apache/arrow-datafusion/pull/3780#discussion_r995554351


##########
datafusion/core/src/physical_plan/file_format/parquet.rs:
##########
@@ -785,6 +902,57 @@ impl<'a> PruningStatistics for 
RowGroupPruningStatistics<'a> {
     }
 }
 
+impl<'a> PruningStatistics for PagesPruningStatistics<'a> {
+    fn min_values(&self, column: &Column) -> Option<ArrayRef> {
+        get_min_max_values_form_page_index!(self, column, min)
+    }
+
+    fn max_values(&self, column: &Column) -> Option<ArrayRef> {
+        get_min_max_values_form_page_index!(self, column, max)
+    }
+
+    fn num_containers(&self) -> usize {
+        self.offset_indexes.get(self.col_id).unwrap().len()

Review Comment:
   Agree! as you mention:
   >It also means that the "number of containers" in Column A will be "2" (as 
it as pages 1 and 2) and the number of containers for Column B will be "3" (as 
it has pages 3, 4, 5).
   
   we should base on the *column identification* to return  the number of 
containers, but hard to do this base on this method
   
https://github.com/apache/arrow-datafusion/blob/e7edac5b66f8b929382a582021165f97392bd50c/datafusion/core/src/physical_optimizer/pruning.rs#L81-L83
   
   I will modify this in following pr and try support multi cols predicates.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to