adriangb commented on code in PR #18644:
URL: https://github.com/apache/datafusion/pull/18644#discussion_r2544491321
##########
datafusion/physical-plan/src/aggregates/mod.rs:
##########
@@ -418,6 +504,13 @@ pub struct AggregateExec {
/// Describes how the input is ordered relative to the group by columns
input_order_mode: InputOrderMode,
cache: PlanProperties,
+ /// During initialization, if the plan supports dynamic filtering (see
[`AggrDynFilter`]),
+ /// it is set to `Some(..)` regardless of whether it can be pushed down to
a child node.
+ ///
+ /// During filter pushdown optimization, if a child node can accept this
filter,
+ /// it remains `Some(..)` to enable dynamic filtering during aggregate
execution;
+ /// otherwise, it is cleared to `None`.
Review Comment:
> One thing I feel also ambiguous is when a child replies, (e.g. for
parquet) do we differentiate if they're using the dynamic filter for stat
pruning, or evaluate filter row by row
That's the crux of the issue:
https://github.com/apache/datafusion/blob/068f96f04906dac223246d6f3608196f194b864f/datafusion/datasource-parquet/src/source.rs#L767-L774
So currently Parquet says `No` even when it stores a copy of the filters for
stats pruning. So for this PR we should change it to always update the filters
even if the child responded `No`.
If Parquet responded `Yes` instead it would possibly lead to incorrect
results if it can't actually evaluate the filter row by row.
For operators like `HashJoinExec` that may want to push down an expression
that cannot be used for stats pruning the "wasted effort" would be if they
still do compute to update the filters but then ParquetSource can't use them
for stats pruning and doesn't push them down row-by-row.
So in summary:
1. I think this PR should always push down the filters even if the answer it
got back is `No` because ParquetSource may still use them for stats pruning.
2. We should make a ticket to improve the system to have 3 states
`Exact/Inexact/Unsupported`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]