[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308303#comment-14308303 ]
Xuefu Zhang commented on HIVE-9561: ----------------------------------- Does QueryProperties.hasSortBy() help? For a sort by query, we don't need SHUFFLE_SORT, right? MR_SHUFFLE may just be sufficient. For this particular query, it seems making no sense to have either sort by or order by in the subqueries. They make sense only if they are specified for the final output. Do you agree? Maybe we can do some optimization to detect and remove those if they are not for the final output. > SHUFFLE_SORT should only be used for order by query [Spark Branch] > ------------------------------------------------------------------ > > Key: HIVE-9561 > URL: https://issues.apache.org/jira/browse/HIVE-9561 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Rui Li > Assignee: Rui Li > Attachments: HIVE-9561.1-spark.patch > > > The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance > and are difficult to control. So we should limit the use of {{sortByKey}} to > order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)