[
https://issues.apache.org/jira/browse/SPARK-20010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenchen Fan reassigned SPARK-20010:
-----------------------------------
Assignee: Zhenhua Wang
> Sort information is lost after sort merge join
> ----------------------------------------------
>
> Key: SPARK-20010
> URL: https://issues.apache.org/jira/browse/SPARK-20010
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.2.0
> Reporter: Zhenhua Wang
> Assignee: Zhenhua Wang
> Fix For: 2.2.0
>
>
> After sort merge join for inner join, now we only keep left key ordering.
> However, after inner join, right key has the same value and order as left
> key. So if we need another smj on right key, we will unnecessarily add a sort
> which causes additional cost.
> As a more complicated example, A join B on A.key = B.key join C on B.key =
> C.key join D on A.key = D.key. We will unnecessarily add a sort on B.key when
> join \{A, B\} and C, and add a sort on A.key when join \{A, B, C\} and D.
> To fix this, we need to propagate all sorted information (equivalent
> expressions) from bottom up through `outputOrdering` and `SortOrder`.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]