[ 
https://issues.apache.org/jira/browse/SPARK-20010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-20010:
-----------------------------------

    Assignee: Zhenhua Wang

> Sort information is lost after sort merge join
> ----------------------------------------------
>
>                 Key: SPARK-20010
>                 URL: https://issues.apache.org/jira/browse/SPARK-20010
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Zhenhua Wang
>            Assignee: Zhenhua Wang
>             Fix For: 2.2.0
>
>
> After sort merge join for inner join, now we only keep left key ordering. 
> However, after inner join, right key has the same value and order as left 
> key. So if we need another smj on right key, we will unnecessarily add a sort 
> which causes additional cost.
> As a more complicated example, A join B on A.key = B.key join C on B.key = 
> C.key join D on A.key = D.key. We will unnecessarily add a sort on B.key when 
> join \{A, B\} and C, and add a sort on A.key when join \{A, B, C\} and D.
> To fix this, we need to propagate all sorted information (equivalent 
> expressions) from bottom up through `outputOrdering` and `SortOrder`.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to