Re: [PR] ensure dynamic filters are correctly pushed down through aggregations [datafusion]

via GitHub Thu, 26 Mar 2026 12:36:19 -0700


jayshrivastava commented on code in PR #21059:
URL: https://github.com/apache/datafusion/pull/21059#discussion_r2997257067



##########
datafusion/physical-plan/src/aggregates/mod.rs:
##########
@@ -1473,11 +1473,12 @@ impl ExecutionPlan for AggregateExec {
         // This optimization is NOT safe for filters on aggregated columns 
(like filtering on
         // the result of SUM or COUNT), as those require computing all groups 
first.
 
-        let grouping_columns: HashSet<_> = self
-            .group_by
-            .expr()
-            .iter()
-            .flat_map(|(expr, _)| collect_columns(expr))
+        // Build grouping columns using output indices because parent filters 
reference the AggregateExec's output schema where grouping
+        // columns in the output schema. The grouping expressions reference
+        // input columns which may not match the output schema.
+        let output_schema = self.schema();
+        let grouping_columns: HashSet<_> = (0..self.group_by.expr().len())
+            .map(|i| Column::new(output_schema.field(i).name(), i))

Review Comment:
   I linked `AggregateExec` itself and `create_schema`. We have this column 
ordering in every path where `AggregateExec` is created.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] ensure dynamic filters are correctly pushed down through aggregations [datafusion]

Reply via email to