[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

Xuefu Zhang (JIRA) Thu, 05 Feb 2015 16:04:01 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308303#comment-14308303
 ]


Xuefu Zhang commented on HIVE-9561:
-----------------------------------

Does QueryProperties.hasSortBy() help?

For a sort by query, we don't need SHUFFLE_SORT, right? MR_SHUFFLE may just be 
sufficient.

For this particular query, it seems making no sense to have either sort by or 
order by in the subqueries. They make sense only if they are specified for the 
final output. Do you agree?

Maybe we can do some optimization to detect and remove those if they are not 
for the final output.

> SHUFFLE_SORT should only be used for order by query [Spark Branch]
> ------------------------------------------------------------------
>
>                 Key: HIVE-9561
>                 URL: https://issues.apache.org/jira/browse/HIVE-9561
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-9561.1-spark.patch
>
>
> The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
> and are difficult to control. So we should limit the use of {{sortByKey}} to 
> order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

Reply via email to