Dandandan opened a new pull request, #23111:
URL: https://github.com/apache/datafusion/pull/23111
## Which issue does this PR close?
- N/A (new optimization). Happy to file a tracking issue if preferred.
## Rationale for this change
`SortMergeJoinExec` currently always runs in a symmetric, hash-partitioned
mode:
both inputs are hash-partitioned on the join keys and sorted, and partition
`i`
of the left is merge-joined with partition `i` of the right.
When one side is small, hash-repartitioning the *large* side is wasteful. For
hash joins this is already handled by `PartitionMode::CollectLeft` (collect
the
small build side once, broadcast it, don't repartition the probe side). This
PR
brings the same idea to sort-merge joins.
## What changes are included in this PR?
Add `PartitionMode::CollectLeft` support to `SortMergeJoinExec`:
- The **left** side is collected into a single, fully-sorted run (shared once
across all right partitions via `OnceAsync`) and the **right** side is left
un-repartitioned — each right partition merge-joins the full collected
left,
producing one output partition per right partition.
- `required_input_distribution` becomes `[SinglePartition, Unspecified]` (the
right keeps whatever partitioning it has; both sides still require their
sort
ordering), and output partitioning follows the right side.
- Supported only for join types whose output is determined per right
partition:
`Inner`, `Right`, `RightSemi`, `RightAnti`, `RightMark`. Left-side joins
(`Left`/`LeftSemi`/`LeftAnti`/`LeftMark`/`Full`) would require tracking
left-row matches across all right partitions and are rejected by
`with_mode`.
- The `JoinSelection` physical-optimizer rule switches a `Partitioned` SMJ to
`CollectLeft` when the join type is supported and the left side is
estimated
to be small enough, reusing the existing
`hash_join_single_partition_threshold` /
`hash_join_single_partition_threshold_rows`
thresholds. `EnsureRequirements` then collapses the left to one sorted run
(`SortExec` / `SortPreservingMergeExec`) and leaves the right
un-hash-partitioned.
- The mode is shown in `EXPLAIN` (only when `CollectLeft`, to avoid churning
existing `Partitioned` plans) and is preserved across `with_new_children`
and
projection pushdown; `partition_statistics` is made mode-aware (mirroring
`HashJoinExec`).
The existing k-way merge / streamed-vs-buffered execution is reused
unchanged:
the collected left is replayed as the left input to each right partition's
join.
## Are these changes tested?
Yes:
- Unit tests (`physical-plan`): `CollectLeft` output equals the default
`Partitioned` output for all supported join types over a multi-partition
right; unsupported join types are rejected; mode-aware
`partition_statistics`
over a multi-partition right.
- Integration tests (`core`): `JoinSelection` selects `CollectLeft` for a
small
left, and stays `Partitioned` for a big left or an unsupported join type.
- sqllogictest: `sort_merge_join.slt` / `joins.slt` exercise the full
pipeline
(`JoinSelection` → `EnsureRequirements` → `SanityCheckPlan` → execution)
with
`prefer_hash_join = false`; results are unchanged, only `EXPLAIN` plans
change
(no hash repartition on the join key; left collapsed to one sorted run). A
new
regression in `sort_merge_join.slt` covers a narrowing projection pushed
into a
`CollectLeft` join over a multi-partition right.
## Are there any user-facing changes?
`EXPLAIN` output now shows `mode=CollectLeft` on sort-merge joins that use
the
new mode, and such plans avoid hash-repartitioning the right input. No public
API breakage (the mode defaults to `Partitioned`; `with_mode` is additive).
No
new configuration options.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]