[
https://issues.apache.org/jira/browse/SPARK-33400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228680#comment-17228680
]
Apache Spark commented on SPARK-33400:
--------------------------------------
User 'prakharjain09' has created a pull request for this issue:
https://github.com/apache/spark/pull/30302
> Reduce unneeded sorts between two SortMergeJoins
> ------------------------------------------------
>
> Key: SPARK-33400
> URL: https://issues.apache.org/jira/browse/SPARK-33400
> Project: Spark
> Issue Type: Task
> Components: SQL
> Affects Versions: 2.4.7, 3.0.0, 3.0.1
> Reporter: Prakhar Jain
> Priority: Major
>
> When a SortMergeJoin is followed by a Project with aliases, the
> outputOrdering is not propogated properly and in some cases, it leads to
> unrequired Sort operation:
>
>
> {noformat}
> spark.range(10).repartition($"id").createTempView("t1")
> spark.range(20).repartition($"id").createTempView("t2")
> spark.range(30).repartition($"id").createTempView("t3")
> val planned = sql(
> """
> |SELECT t2id, t3.id as t3id
> |FROM (
> | SELECT t1.id as t1id, t2.id as t2id
> | FROM t1, t2
> | WHERE t1.id = t2.id
> |) t12, t3
> |WHERE t1id = t3.id
> """.stripMargin).queryExecution.executedPlan
> *(8) Project [t2id#1059L, id#1004L AS t3id#1060L]
> +- *(8) SortMergeJoin [t2id#1059L], [id#1004L], Inner
> :- *(5) Sort [t2id#1059L ASC NULLS FIRST ], false, 0
> <-----------------------
> : +- *(5) Project [id#1000L AS t2id#1059L]
> : +- *(5) SortMergeJoin [id#996L], [id#1000L], Inner
> : :- *(2) Sort [id#996L ASC NULLS FIRST ], false, 0
> : : +- Exchange hashpartitioning(id#996L, 5), true, [id=#1426]
> : : +- *(1) Range (0, 10, step=1, splits=2)
> : +- *(4) Sort [id#1000L ASC NULLS FIRST ], false, 0
> : +- Exchange hashpartitioning(id#1000L, 5), true, [id=#1432]
> : +- *(3) Range (0, 20, step=1, splits=2)
> +- *(7) Sort [id#1004L ASC NULLS FIRST ], false, 0
> +- Exchange hashpartitioning(id#1004L, 5), true, [id=#1443]
> +- *(6) Range (0, 30, step=1, splits=2)
> {noformat}
> The above marked Sort node could have been avoided.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]