mdashti opened a new pull request, #20455:
URL: https://github.com/apache/datafusion/pull/20455

   ## Which issue does this PR close?
   
   - Closes #20443.
   
   ## Rationale for this change
   
   `SortMergeJoinExec` uses the default `gather_filters_for_pushdown` 
implementation, which marks all parent filters as unsupported. This means 
dynamic filters from TopK (`SortExec` with `fetch`) cannot pass through 
sort-merge joins to reach scan nodes — even though the filter routing logic is 
straightforward for Inner joins. `HashJoinExec` already supports this.
   
   ## What changes are included in this PR?
   
   Implements `gather_filters_for_pushdown` and `handle_child_pushdown_result` 
on `SortMergeJoinExec` for **Inner joins only**. For Inner joins the output 
schema is `[left_cols..., right_cols...]`, so each parent filter is routed to 
the correct child based on its column references using 
`ChildFilterDescription::from_child_with_allowed_indices` (same approach as 
`HashJoinExec`). All non-Inner join types conservatively return 
`all_unsupported`.
   
   This is a minimal, non-intrusive patch: static filter passthrough only. No 
dynamic filter *creation* from join keys (that's a separate, larger feature).
   
   ## Are these changes tested?
   
   Yes, at three levels:
   
   1. **Optimizer unit tests** (`filter_pushdown.rs`) — verify plan structure: 
left/right filters route to correct children, cross-side filters stay, 
non-Inner joins reject pushdown, and TopK dynamic filters propagate through SMJ 
to scan nodes.
   2. **SQL logic tests** (`dynamic_filter_pushdown_config.slt`) — end-to-end 
EXPLAIN verification that `DynamicFilter` appears on the correct 
`DataSourceExec` for Inner joins (on both join-key and non-key columns), and is 
absent for Left joins. Includes correctness checks.
   3. **Integration tests** (`smj_filter_pushdown.rs`) — 11 tests using 
in-memory parquet that run each query with and without dynamic filter pushdown 
and assert identical results. Covers TopK on left/right/join-key columns, DESC 
order, multi-column sorts, WHERE clauses, LIMIT edge cases, LEFT JOIN 
correctness, and nested joins.
   
   ## Are there any user-facing changes?
   
   No API changes. Queries using `SortMergeJoinExec` with Inner joins may now 
benefit from dynamic filter pushdown (e.g. TopK pruning), improving performance 
for `ORDER BY ... LIMIT` queries over sort-merge joins.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to