Dandandan opened a new pull request, #23111:
URL: https://github.com/apache/datafusion/pull/23111

   ## Which issue does this PR close?
   
   - N/A (new optimization). Happy to file a tracking issue if preferred.
   
   ## Rationale for this change
   
   `SortMergeJoinExec` currently always runs in a symmetric, hash-partitioned 
mode:
   both inputs are hash-partitioned on the join keys and sorted, and partition 
`i`
   of the left is merge-joined with partition `i` of the right.
   
   When one side is small, hash-repartitioning the *large* side is wasteful. For
   hash joins this is already handled by `PartitionMode::CollectLeft` (collect 
the
   small build side once, broadcast it, don't repartition the probe side). This 
PR
   brings the same idea to sort-merge joins.
   
   ## What changes are included in this PR?
   
   Add `PartitionMode::CollectLeft` support to `SortMergeJoinExec`:
   
   - The **left** side is collected into a single, fully-sorted run (shared once
     across all right partitions via `OnceAsync`) and the **right** side is left
     un-repartitioned — each right partition merge-joins the full collected 
left,
     producing one output partition per right partition.
   - `required_input_distribution` becomes `[SinglePartition, Unspecified]` (the
     right keeps whatever partitioning it has; both sides still require their 
sort
     ordering), and output partitioning follows the right side.
   - Supported only for join types whose output is determined per right 
partition:
     `Inner`, `Right`, `RightSemi`, `RightAnti`, `RightMark`. Left-side joins
     (`Left`/`LeftSemi`/`LeftAnti`/`LeftMark`/`Full`) would require tracking
     left-row matches across all right partitions and are rejected by 
`with_mode`.
   - The `JoinSelection` physical-optimizer rule switches a `Partitioned` SMJ to
     `CollectLeft` when the join type is supported and the left side is 
estimated
     to be small enough, reusing the existing
     `hash_join_single_partition_threshold` / 
`hash_join_single_partition_threshold_rows`
     thresholds. `EnsureRequirements` then collapses the left to one sorted run
     (`SortExec` / `SortPreservingMergeExec`) and leaves the right 
un-hash-partitioned.
   - The mode is shown in `EXPLAIN` (only when `CollectLeft`, to avoid churning
     existing `Partitioned` plans) and is preserved across `with_new_children` 
and
     projection pushdown; `partition_statistics` is made mode-aware (mirroring
     `HashJoinExec`).
   
   The existing k-way merge / streamed-vs-buffered execution is reused 
unchanged:
   the collected left is replayed as the left input to each right partition's 
join.
   
   ## Are these changes tested?
   
   Yes:
   - Unit tests (`physical-plan`): `CollectLeft` output equals the default
     `Partitioned` output for all supported join types over a multi-partition
     right; unsupported join types are rejected; mode-aware 
`partition_statistics`
     over a multi-partition right.
   - Integration tests (`core`): `JoinSelection` selects `CollectLeft` for a 
small
     left, and stays `Partitioned` for a big left or an unsupported join type.
   - sqllogictest: `sort_merge_join.slt` / `joins.slt` exercise the full 
pipeline
     (`JoinSelection` → `EnsureRequirements` → `SanityCheckPlan` → execution) 
with
     `prefer_hash_join = false`; results are unchanged, only `EXPLAIN` plans 
change
     (no hash repartition on the join key; left collapsed to one sorted run). A 
new
     regression in `sort_merge_join.slt` covers a narrowing projection pushed 
into a
     `CollectLeft` join over a multi-partition right.
   
   ## Are there any user-facing changes?
   
   `EXPLAIN` output now shows `mode=CollectLeft` on sort-merge joins that use 
the
   new mode, and such plans avoid hash-repartitioning the right input. No public
   API breakage (the mode defaults to `Partitioned`; `with_mode` is additive). 
No
   new configuration options.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to