Qi Zhu created SPARK-47284:
------------------------------
Summary: We should ensure enough parallelism when
ShuffleExchangeLike join with specs without shuffle
Key: SPARK-47284
URL: https://issues.apache.org/jira/browse/SPARK-47284
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 4.0.0
Reporter: Qi Zhu
The following case is introduced by
https://issues.apache.org/jira/browse/SPARK-35703
// When choosing specs, we should consider those children with no
`ShuffleExchangeLike` node
// first. For instance, if we have:
// A: (No_Exchange, 100) <---> B: (Exchange, 120)
// it's better to pick A and change B to (Exchange, 100) instead of picking B
and insert a
// new shuffle for A.
But we'd better improve it in some cases, for example:
A: (No_Exchange, 2) <---> B: (Exchange, 100)
The current logic will change to:
A: (No_Exchange, 2) <---> B: (Exchange,2)
It actually not ensure enough parallelism, it will reduce the performance i
think.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]