andygrove opened a new issue, #1905:
URL: https://github.com/apache/datafusion-ballista/issues/1905
## Is your feature request related to a problem or challenge?
When the static distributed planner promotes a join to a broadcast
`HashJoinExec(CollectLeft)` (`maybe_promote_to_broadcast` + the broadcast
lowering
in `plan_query_stages_internal`), the join's inputs still carry the join-key
`RepartitionExec(Hash)` that `EnforceDistribution` inserted for the original
partitioned join. As a result:
- **Build side** is hash-repartitioned into a shuffle stage and *then*
written
again as a broadcast stage — two shuffles where broadcast should need one
(just
replicate the build's natural partitions).
- **Probe side** still reshuffles on the join key, even though a
`CollectLeft`
join replicates the build to every probe task and does not require the
probe to
be partitioned on the join key.
So the broadcast promotion adds a broadcast without removing the reshuffles
it
was meant to avoid. On TPC-H SF10 (AQE off) the SMJ-broadcast path (#1904)
still
gives ~16% because eliminating the sort + collecting a small side helps, but
a
large part of the intended benefit — skipping the join-key shuffles entirely
— is
left on the table.
This affects **both** broadcast paths:
- the hash-join broadcast from #1647, and
- the sort-merge-join broadcast from #1904 (#1679).
## Describe the solution you'd like
During broadcast lowering, strip the redundant join-key
`RepartitionExec(Hash)`
from the converted join's inputs:
- broadcast the build side from its natural (upstream) partitions instead of
hash-repartitioning then broadcasting;
- keep the probe side at its upstream partitioning rather than reshuffling
on the
join key.
## Additional context
Correctness to verify: a `CollectLeft` join's output partitioning follows the
probe side. Removing the probe's join-key repartition changes the probe (and
therefore the join output) partitioning, so downstream operators that assume
hash-on-join-key partitioning must be re-checked / re-satisfied by
`EnforceDistribution`/`EnforceSorting`. Build-side broadcast must still
replicate
all build partitions to every probe task.
Follow-up to #1904 (#1679). Related: #342, #348.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]