[Hive] group by over a subquery with a cluster by not optimized
---------------------------------------------------------------

                 Key: HADOOP-4415
                 URL: https://issues.apache.org/jira/browse/HADOOP-4415
             Project: Hadoop Core
          Issue Type: Bug
          Components: contrib/hive
            Reporter: Namit Jain
            Assignee: Namit Jain


Consider the following


select x.a, count(x.b) from (select ...... cluster by a) x group by x.a


Even though the user has specifically asked to cluster by a, the group by will 
again run 2 map-reduce jobs,
sorting by a random number and a in that order. So, there will be a total of 3 
map-reduce jobs sorting
by a, random and a respectively - this should be optimized

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to