Prakhar Jain created SPARK-33399:
------------------------------------
Summary: Reduce unneeded exchanges after SortMerge joins
Key: SPARK-33399
URL: https://issues.apache.org/jira/browse/SPARK-33399
Project: Spark
Issue Type: Task
Components: SQL
Affects Versions: 3.0.1, 3.0.0, 2.4.7
Reporter: Prakhar Jain
Spark introduces unneeded exchanges if there is a Project after Inner join.
Example:
{noformat}
spark.range(10).repartition($"id").createTempView("t1")
spark.range(20).repartition($"id").createTempView("t2")
spark.range(30).repartition($"id").createTempView("t3")
val planned = sql(
"""
|SELECT t2id, t3.id as t3id
|FROM (
| SELECT t1.id as t1id, t2.id as t2id
| FROM t1, t2
| WHERE t1.id = t2.id
|) t12, t3
|WHERE t1id = t3.id
""".stripMargin).queryExecution.executedPlan
*(9) Project [t2id#1034L, id#1004L AS t3id#1035L]
+- *(9) SortMergeJoin [t1id#1033L], [id#1004L], Inner
:- *(6) Sort [t1id#1033L ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(t1id#1033L, 5), true, [id=#1343]
<---------------
: +- *(5) Project [id#996L AS t1id#1033L, id#1000L AS t2id#1034L]
: +- *(5) SortMergeJoin [id#996L], [id#1000L], Inner
: :- *(2) Sort [id#996L ASC NULLS FIRST], false, 0
: : +- Exchange hashpartitioning(id#996L, 5), true, [id=#1329]
: : +- *(1) Range (0, 10, step=1, splits=2)
: +- *(4) Sort [id#1000L ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(id#1000L, 5), true, [id=#1335]
: +- *(3) Range (0, 20, step=1, splits=2)
+- *(8) Sort [id#1004L ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(id#1004L, 5), true, [id=#1349]
+- *(7) Range (0, 30, step=1, splits=2){noformat}
The marked exchange in the above plan can be removed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]