neilconway opened a new pull request, #22534: URL: https://github.com/apache/datafusion/pull/22534
EliminateOuterJoin previously only matched the literal Filter -> Join pattern. When a Projection sits between the Filter and the Join, the rule no-ops and the outer join stays in place even when the predicate above the projection would justify converting it. A common shape that hits this comes from projection pruning after filter pushdown. In TPC-DS q49, PushDownFilter moves the returns-side predicate above the sales/returns LEFT JOIN, then OptimizeProjections inserts a pruning Projection between that Filter and the LEFT JOIN. The returns-side predicate still filters out the outer rows, but the projection hides the join from the old rule. Extend the rule to walk down through Projection nodes between Filter and Join, rewriting a working copy of the predicate into the join's coordinate space for analysis. The rewritten predicate is used only for analysis; the original predicate and surrounding plan structure are preserved on success. Tests cover passthrough projection, aliased projection, negative cases, a non-transparent Limit guard, and SQL-level q49-shaped cases where OptimizeProjections places a pruning Projection between a returns-side Filter and the sales/returns LEFT JOIN. ## Which issue does this PR close? - Closes #22531. ## Rationale for this change `EliminateOuterJoin` previously looked for plans with a `Filter` directly above a `Join`. For most queries, that is the right plan shape to look for (because `PushdownFilter` will typically place the filters that are useful for outer join elimination directly on top of the relevant `Join`). However, some plans don't follow this shape, for at least two reasons: 1. Volatile expressions can interfere with filter pushdown 2. `OptimizeProjections` might insert a `Projection` between the `Filter` and `Join` Notably, we run into case (2) in TPC-DS Q49; we currently fail to convert three outer joins to inner joins for that reason. We can handle this by teaching `EliminateOuterJoins` to descend through one or more intermediate `Projection` nodes, rewriting the filter predicate as it goes to account for the effect of the projection. ## What changes are included in this PR? * Teach `EliminateOuterJoins` to descend through one or more `Projection` nodes * Refactor various code in `eliminate_outer_joins.rs`, improve comments * Add unit tests * Add SLT tests ## Are these changes tested? Yes, new tests added. Manually verified that we fail to eliminate the outer joins in TPC-DS Q49 without this change and succeed on doing so with this change. ## Are there any user-facing changes? More effective outer join query optimization. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
