Ted-Jiang commented on code in PR #3780:
URL: https://github.com/apache/arrow-datafusion/pull/3780#discussion_r992218989
##########
datafusion/core/src/physical_plan/file_format/parquet.rs:
##########
@@ -785,6 +902,57 @@ impl<'a> PruningStatistics for
RowGroupPruningStatistics<'a> {
}
}
+impl<'a> PruningStatistics for PagesPruningStatistics<'a> {
+ fn min_values(&self, column: &Column) -> Option<ArrayRef> {
+ get_min_max_values_form_page_index!(self, column, min)
+ }
+
+ fn max_values(&self, column: &Column) -> Option<ArrayRef> {
+ get_min_max_values_form_page_index!(self, column, max)
+ }
+
+ fn num_containers(&self) -> usize {
+ self.offset_indexes.get(self.col_id).unwrap().len()
Review Comment:
@alamb PTAL, 🤔 for now `num_containers ` only return on values
I think we should modify it to
```
fn num_containers(&self, column: &Column) -> usize {
```
because each column chunk in one row group has different page numbers
##########
datafusion/core/src/physical_plan/file_format/parquet.rs:
##########
@@ -460,6 +498,20 @@ impl FileOpener for ParquetOpener {
}
}
+// Check PruningPredicates just work on one column.
+fn check_page_index_push_down_valid(predicate: &Option<PruningPredicate>) ->
bool {
+ if let Some(predicate) = predicate {
+ // for now we only support pushDown on one col, because each col may
have different page numbers, its hard to get
+ // `num_containers` from `PruningStatistics`.
+ let cols = predicate.need_input_columns_ids();
+ //Todo more specific rules
Review Comment:
Now, cause of `num_containers ` we only support only one col. In the
future, we could add more specific rules.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]