wirybeaver commented on issue #1359: URL: https://github.com/apache/datafusion-ballista/issues/1359#issuecomment-4552223894
Two upstream DataFusion PRs have been opened to fix physical-optimizer rules that are not idempotent, which directly impacts AQE replanning: 1. **[fix(physical-optimizer): make OutputRequirements idempotent](https://github.com/apache/datafusion/pull/22522)** — `OutputRequirements::new_add_mode()` stacks an additional `OutputRequirementExec` wrapper on every re-optimization pass. The fix adds a guard in `require_top_ordering()` to return the plan unchanged when it is already topped by `OutputRequirementExec`. 2. **[fix(physical-plan): make HashJoinExec dynamic filter pushdown idempotent](https://github.com/apache/datafusion/pull/22523)** — `FilterPushdown::new_post_optimization()` unconditionally creates a new `DynamicFilterPhysicalExpr` on each pass and ANDs it onto the probe-side scan's predicate, producing `DynamicFilter AND DynamicFilter AND ...` after N replans. The fix skips dynamic-filter creation when `HashJoinExec` already carries one from a previous pass. These were the only two offenders found by running a discovery harness that applies each rule in `PhysicalOptimizer::new().rules` twice on a set of plan fixtures and compares the output structurally. Once these land in a DataFusion release, `WarnOnDuplicateExecRule` in Ballista's AQE chain should no longer fire on standard workloads (verified against a multi-stage aggregation fixture and TPC-H-style plans). A Ballista-side regression test has also been prepared on the `phase-c-aqe-idempotence-regression` branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
