[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yin Huai updated HIVE-2206: --------------------------- Attachment: HIVE-2206.8-r1237253.patch.txt @Kevin, I wrongly assumed that all output names of the ReduceSinkOperator has a structure of "KEY/VALUE.internalName". I have solved this issue. However, the current optimizer cannot handel the case that a table is directly connect to a post computation operator (in this case, table b directly connects to the operator join). I am planning to solve this issue after this patch. To walkaround, you can use ... SET hive.optimize.reducededuplication=false; SET hive.optimize.correlation=true; SELECT * FROM (SELECT * FROM src DISTRIBUTE BY key SORT BY key) a JOIN (SELECT * FROM src DISTRIBUTE BY key SORT BY key) b ON a.key = b.key;. This query will be optimized and be executed in a single MapReduce job. Also, I have updated the patch and it is compatible with revision 1237253. > add a new optimizer for query correlation discovery and optimization > -------------------------------------------------------------------- > > Key: HIVE-2206 > URL: https://issues.apache.org/jira/browse/HIVE-2206 > Project: Hive > Issue Type: New Feature > Reporter: He Yongqiang > Assignee: Yin Huai > Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, > HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, > HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, > HIVE-2206.8-r1237253.patch.txt, HIVE-2206.8.r1224646.patch.txt, > YSmartPatchForHive.patch, testQueries.2.q > > > reference: > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira