mdashti opened a new pull request, #20455: URL: https://github.com/apache/datafusion/pull/20455
## Which issue does this PR close? - Closes #20443. ## Rationale for this change `SortMergeJoinExec` uses the default `gather_filters_for_pushdown` implementation, which marks all parent filters as unsupported. This means dynamic filters from TopK (`SortExec` with `fetch`) cannot pass through sort-merge joins to reach scan nodes — even though the filter routing logic is straightforward for Inner joins. `HashJoinExec` already supports this. ## What changes are included in this PR? Implements `gather_filters_for_pushdown` and `handle_child_pushdown_result` on `SortMergeJoinExec` for **Inner joins only**. For Inner joins the output schema is `[left_cols..., right_cols...]`, so each parent filter is routed to the correct child based on its column references using `ChildFilterDescription::from_child_with_allowed_indices` (same approach as `HashJoinExec`). All non-Inner join types conservatively return `all_unsupported`. This is a minimal, non-intrusive patch: static filter passthrough only. No dynamic filter *creation* from join keys (that's a separate, larger feature). ## Are these changes tested? Yes, at three levels: 1. **Optimizer unit tests** (`filter_pushdown.rs`) — verify plan structure: left/right filters route to correct children, cross-side filters stay, non-Inner joins reject pushdown, and TopK dynamic filters propagate through SMJ to scan nodes. 2. **SQL logic tests** (`dynamic_filter_pushdown_config.slt`) — end-to-end EXPLAIN verification that `DynamicFilter` appears on the correct `DataSourceExec` for Inner joins (on both join-key and non-key columns), and is absent for Left joins. Includes correctness checks. 3. **Integration tests** (`smj_filter_pushdown.rs`) — 11 tests using in-memory parquet that run each query with and without dynamic filter pushdown and assert identical results. Covers TopK on left/right/join-key columns, DESC order, multi-column sorts, WHERE clauses, LIMIT edge cases, LEFT JOIN correctness, and nested joins. ## Are there any user-facing changes? No API changes. Queries using `SortMergeJoinExec` with Inner joins may now benefit from dynamic filter pushdown (e.g. TopK pruning), improving performance for `ORDER BY ... LIMIT` queries over sort-merge joins. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
