[ 
https://issues.apache.org/jira/browse/SPARK-33399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228664#comment-17228664
 ] 

Apache Spark commented on SPARK-33399:
--------------------------------------

User 'prakharjain09' has created a pull request for this issue:
https://github.com/apache/spark/pull/30300

> Reduce unneeded exchanges after SortMerge joins
> -----------------------------------------------
>
>                 Key: SPARK-33399
>                 URL: https://issues.apache.org/jira/browse/SPARK-33399
>             Project: Spark
>          Issue Type: Task
>          Components: SQL
>    Affects Versions: 2.4.7, 3.0.0, 3.0.1
>            Reporter: Prakhar Jain
>            Priority: Major
>
> Spark introduces unneeded exchanges if there is a Project after Inner join. 
> Example:
>  
> {noformat}
> spark.range(10).repartition($"id").createTempView("t1")
> spark.range(20).repartition($"id").createTempView("t2")
> spark.range(30).repartition($"id").createTempView("t3")
> val planned = sql(
>    """
>      |SELECT t2id, t3.id as t3id
>      |FROM (
>      |    SELECT t1.id as t1id, t2.id as t2id
>      |    FROM t1, t2
>      |    WHERE t1.id = t2.id
>      |) t12, t3
>      |WHERE t1id = t3.id
>    """.stripMargin).queryExecution.executedPlan
> *(9) Project [t2id#1034L, id#1004L AS t3id#1035L]
> +- *(9) SortMergeJoin [t1id#1033L], [id#1004L], Inner
>    :- *(6) Sort [t1id#1033L ASC NULLS FIRST], false, 0
>    :  +- Exchange hashpartitioning(t1id#1033L, 5), true, [id=#1343] 
> <---------------
>    :     +- *(5) Project [id#996L AS t1id#1033L, id#1000L AS t2id#1034L]
>    :        +- *(5) SortMergeJoin [id#996L], [id#1000L], Inner
>    :           :- *(2) Sort [id#996L ASC NULLS FIRST], false, 0
>    :           :  +- Exchange hashpartitioning(id#996L, 5), true, [id=#1329]
>    :           :     +- *(1) Range (0, 10, step=1, splits=2)
>    :           +- *(4) Sort [id#1000L ASC NULLS FIRST], false, 0
>    :              +- Exchange hashpartitioning(id#1000L, 5), true, [id=#1335]
>    :                 +- *(3) Range (0, 20, step=1, splits=2)
>    +- *(8) Sort [id#1004L ASC NULLS FIRST], false, 0
>       +- Exchange hashpartitioning(id#1004L, 5), true, [id=#1349]
>          +- *(7) Range (0, 30, step=1, splits=2){noformat}
> The marked exchange in the above plan can be removed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to