[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rui Li updated HIVE-9561: ------------------------- Attachment: HIVE-9561.1-spark.patch The patch can only partially solve the problem. For some cases, we still can use SHUFFLE_SORT when we don't have to. For example: {code} explain select * from (select pagerank from tiny sort by pagerank) t join (select * from skewed order by pagerank) s on t.pagerank=s.pagerank; {code} We have a sort-by and an order-by. However, the patch can't distinguish these two and we'll use SHUFFLE_SORT for both of them. [~xuefuz] any ideas? > SHUFFLE_SORT should only be used for order by query [Spark Branch] > ------------------------------------------------------------------ > > Key: HIVE-9561 > URL: https://issues.apache.org/jira/browse/HIVE-9561 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Rui Li > Assignee: Rui Li > Attachments: HIVE-9561.1-spark.patch > > > The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance > and are difficult to control. So we should limit the use of {{sortByKey}} to > order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)