924060929 opened a new pull request, #64793:
URL: https://github.com/apache/doris/pull/64793
### What problem does this PR solve?
Related PR: #63366 (FE local-shuffle planner, merged)
Problem Summary:
Two optimizations on top of the FE local-shuffle planner (#63366),
addressing DORIS-24902 (bucket-shuffle serial bottleneck + parallelism upgrade).
**Part 1: ExchangeNode serial decoupling + bucket-shuffle dest spreading**
When the FE local-shuffle planner is enabled,
`ExchangeNode.isSerialOperatorOnBe` no longer inherits
`fragment.hasSerialScanNode()` — serial status comes from the node itself only.
This decoupling allows `DistributePlanner` to spread bucket-shuffle
destinations to all bucket-owning instances (via `assignedJoinBucketIndexes`)
instead of funneling to the first instance per worker, eliminating the serial
bottleneck diagnosed in DORIS-24902.
Includes BE-side orphan instance fix: when dest spreading targets only
bucket-owning instances, non-owning instances become orphans whose receivers
never get data. Fixed by creating orphan receivers with `num_senders=0` (ready
immediately at EOS) and using per-BE local task index for orphan detection.
**Part 2: Bucket-to-hash parallelism upgrade + RF force_local_merge**
When a pooled-scan fragment has significantly more instances than buckets,
upgrade bucket-shuffle local exchanges to `LOCAL_EXECUTION_HASH_SHUFFLE` so the
join runs at full instance parallelism instead of being capped at bucket count.
- Cores-aware gate: `min(instances, executor_threads) / min(buckets,
executor_threads) > ratio`
- Whole-chain upgrade for stacked bucket joins
- Tunable `bucket_shuffle_downgrade_ratio` (default 0.8)
- RF `force_local_merge` fix: bucket upgrade flips scan from serial to
parallel, breaking the implicit RF merge signal. Added
`TRuntimeFilterDesc.force_local_merge` — FE walks builder→target path after
`AddLocalExchange`; if a `LocalExchangeNode` sits on the path, the target must
merge partial RFs.
### Release note
Add session variable `local_shuffle_bucket_upgrade_ratio` (default 1.0, <= 1
disables) to control bucket-to-hash parallelism upgrade in pooled-scan
fragments. When enabled and per-BE instances significantly exceed bucket count,
bucket-shuffle local exchanges are upgraded to hash exchanges for higher join
parallelism.
### Check List (For Author)
- Test
- [x] Regression test
- [x] Unit Test
- [ ] Manual test
- [ ] No need to test
- Behavior changed:
- [x] Yes. Bucket-shuffle join fragments may now use
`LOCAL_EXECUTION_HASH_SHUFFLE` instead of `BUCKET_HASH_SHUFFLE` local exchanges
when the upgrade ratio is met. Setting `local_shuffle_bucket_upgrade_ratio=0`
(or <= 1) restores the previous behavior.
- Does this need documentation?
- [x] Yes. Session variables `local_shuffle_bucket_upgrade_ratio` and
`bucket_shuffle_downgrade_ratio` should be documented.
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]