Ted-Jiang commented on code in PR #3780:
URL: https://github.com/apache/arrow-datafusion/pull/3780#discussion_r995554351
##########
datafusion/core/src/physical_plan/file_format/parquet.rs:
##########
@@ -785,6 +902,57 @@ impl<'a> PruningStatistics for
RowGroupPruningStatistics<'a> {
}
}
+impl<'a> PruningStatistics for PagesPruningStatistics<'a> {
+ fn min_values(&self, column: &Column) -> Option<ArrayRef> {
+ get_min_max_values_form_page_index!(self, column, min)
+ }
+
+ fn max_values(&self, column: &Column) -> Option<ArrayRef> {
+ get_min_max_values_form_page_index!(self, column, max)
+ }
+
+ fn num_containers(&self) -> usize {
+ self.offset_indexes.get(self.col_id).unwrap().len()
Review Comment:
Agree! as you mention:
>It also means that the "number of containers" in Column A will be "2" (as
it as pages 1 and 2) and the number of containers for Column B will be "3" (as
it has pages 3, 4, 5).
we should base on the *column identification* to return the number of
containers, but hard to do this base on this method
https://github.com/apache/arrow-datafusion/blob/e7edac5b66f8b929382a582021165f97392bd50c/datafusion/core/src/physical_optimizer/pruning.rs#L81-L83
I will modify this in following pr and try support multi cols predicates.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]