xudong963 commented on code in PR #18868:
URL: https://github.com/apache/datafusion/pull/18868#discussion_r2667135142
##########
datafusion/datasource-parquet/src/row_group_filter.rs:
##########
@@ -153,6 +218,68 @@ impl RowGroupAccessPlanFilter {
}
}
+ /// Identifies row groups that are fully matched by the predicate.
+ ///
+ /// This optimization checks whether all rows in a row group satisfy the
predicate
+ /// by inverting the predicate and checking if it prunes the row group. If
the
+ /// inverted predicate prunes a row group, it means no rows match the
inverted
+ /// predicate, which implies all rows match the original predicate.
+ ///
+ /// Note: This optimization is relatively inexpensive for a limited number
of row groups.
+ fn identify_fully_matched_row_groups(
+ &mut self,
+ candidate_row_group_indices: &[usize],
+ arrow_schema: &Schema,
+ parquet_schema: &SchemaDescriptor,
+ groups: &[RowGroupMetaData],
+ predicate: &PruningPredicate,
+ metrics: &ParquetFileMetrics,
+ ) {
+ if candidate_row_group_indices.is_empty() {
+ return;
+ }
+
+ // Use NotExpr to create the inverted predicate
+ let inverted_expr =
Arc::new(NotExpr::new(Arc::clone(predicate.orig_expr())));
+
+ // Simplify the NOT expression (e.g., NOT(c1 = 0) -> c1 != 0)
+ // before building the pruning predicate
+ let simplifier = PhysicalExprSimplifier::new(arrow_schema);
Review Comment:
Yes, we can, but the api `pub fn prune_by_statistics` needs to be changed.
And if we want to do https://github.com/apache/datafusion/issues/19028
later, it looks like we still need the logic even if there isn't a limit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]