wirybeaver commented on issue #1359:
URL: 
https://github.com/apache/datafusion-ballista/issues/1359#issuecomment-4552223894

   Two upstream DataFusion PRs have been opened to fix physical-optimizer rules 
that are not idempotent, which directly impacts AQE replanning:
   
   1. **[fix(physical-optimizer): make OutputRequirements 
idempotent](https://github.com/apache/datafusion/pull/22522)** — 
`OutputRequirements::new_add_mode()` stacks an additional 
`OutputRequirementExec` wrapper on every re-optimization pass. The fix adds a 
guard in `require_top_ordering()` to return the plan unchanged when it is 
already topped by `OutputRequirementExec`.
   
   2. **[fix(physical-plan): make HashJoinExec dynamic filter pushdown 
idempotent](https://github.com/apache/datafusion/pull/22523)** — 
`FilterPushdown::new_post_optimization()` unconditionally creates a new 
`DynamicFilterPhysicalExpr` on each pass and ANDs it onto the probe-side scan's 
predicate, producing `DynamicFilter AND DynamicFilter AND ...` after N replans. 
The fix skips dynamic-filter creation when `HashJoinExec` already carries one 
from a previous pass.
   
   These were the only two offenders found by running a discovery harness that 
applies each rule in `PhysicalOptimizer::new().rules` twice on a set of plan 
fixtures and compares the output structurally. Once these land in a DataFusion 
release, `WarnOnDuplicateExecRule` in Ballista's AQE chain should no longer 
fire on standard workloads (verified against a multi-stage aggregation fixture 
and TPC-H-style plans). A Ballista-side regression test has also been prepared 
on the `phase-c-aqe-idempotence-regression` branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to