[ 
https://issues.apache.org/jira/browse/HIVE-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732231#action_12732231
 ] 

Ashish Thusoo commented on HIVE-647:
------------------------------------

Actually sort is supposed to be a local sort within a reduce instead of a 
global sort. It is usually used along with the distribute by to define the 
manner in which the keys are distributed to a reducer and sorted within a 
reducer.

I believe that if you used order by instead of sort by we automatically select 
1 reducer and do the sort.


> SORT BY with GROUP ignored without LIMIT
> ----------------------------------------
>
>                 Key: HIVE-647
>                 URL: https://issues.apache.org/jira/browse/HIVE-647
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Bill Graham
>
> For queries with GROUP BY and SORT BY, the sort is not handled properly when 
> a LIMIT is not supplied. If I run the following two queries, the first 
> returns properly sorted results. The second does not.
> SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num 
> DESC LIMIT 50;
> SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num 
> DESC;
> Explain is different for the two queries as well. The first uses 3 M/R jobs 
> and the second only uses 2, which might be part of the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to