[Hive] group by over a subquery with a cluster by not optimized ---------------------------------------------------------------
Key: HADOOP-4415 URL: https://issues.apache.org/jira/browse/HADOOP-4415 Project: Hadoop Core Issue Type: Bug Components: contrib/hive Reporter: Namit Jain Assignee: Namit Jain Consider the following select x.a, count(x.b) from (select ...... cluster by a) x group by x.a Even though the user has specifically asked to cluster by a, the group by will again run 2 map-reduce jobs, sorting by a random number and a in that order. So, there will be a total of 3 map-reduce jobs sorting by a, random and a respectively - this should be optimized -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.