optimize join followed by a groupby
-----------------------------------
Key: HIVE-1772
URL: https://issues.apache.org/jira/browse/HIVE-1772
Project: Hive
Issue Type: Improvement
Components: Query Processor
Reporter: Namit Jain
explain SELECT x.key, count(1) FROM src1 x JOIN src y ON (x.key = y.key) group
by x.key;
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-2 depends on stages: Stage-1
Stage-0 is a root stage
The above query issues 2 map-reduce jobs.
The first MR job performs the join, whereas the second MR performs the group by.
Since the data is already sorted, the group by can be performed in the reducer
of the join itself.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.