[ https://issues.apache.org/jira/browse/HIVE-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach updated HIVE-2262: --------------------------------- Fix Version/s: (was: 0.7.1) > mapjoin followed by union all, groupby does not work > ---------------------------------------------------- > > Key: HIVE-2262 > URL: https://issues.apache.org/jira/browse/HIVE-2262 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.7.1 > Reporter: yu xiang > Priority: Trivial > > sql: > CREATE TABLE nulltest2(int_data1 INT, int_data2 INT, boolean_data BOOLEAN, > double_data DOUBLE, string_data STRING) ROW FORMAT DELIMITED FIELDS > TERMINATED BY ','; > CREATE TABLE nulltest3(int_data1 INT) ROW FORMAT DELIMITED FIELDS TERMINATED > BY ','; > explain select int_data2,count(1) from (select /*+mapjoin(a)*/ int_data2, 1 > as c1, 0 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = > b.int_data1) union all select /*+mapjoin(a)*/ int_data2, 1 as c1, 2 as c2 > from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1)) mapjointable > group by int_data2; > exception: > FAILED: Hive Internal Error: java.lang.NullPointerException(null) > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:156) > at > org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:551) > at > org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:514) > at > org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.initPlan(GenMapRedUtils.java:125) > at > org.apache.hadoop.hive.ql.optimizer.GenMRRedSink1.process(GenMRRedSink1.java:76) > at > org.apache.hadoop.hive.ql.optimizer.GenMRRedSink3.process(GenMRRedSink3.java:64) > at > org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) > Analyse the reason: > 1.When use mapjoin,union,groupby together,the > UnionProcFactory.MapJoinUnion()(optimizer) will set the MapJoinSubq true, and > set up the UnionParseContext. > 2.In GenMRUnion1, hive will call mergeMapJoinUnion, and also set task plan. > 3.In GenMRRedSink3, hive judges the uCtx.isMapOnlySubq(), and call > GenMRRedSink1()).process() to init the plan.But the utask's plan has been set > yet, it just need to set reducer.And also the utask is processing temporary > table, there is no topOp map to table.So here we get null exception. > Solutions: > 1.SQL solution:use a sub query to modify the sql; > 2.Code solution:when in mergeMapJoinUnion, after the task plan have been set, > set a settaskplan flag true to indicate the plan for this utask has been > set.When in GenMRRedSink3 ,if this flag sets true, don't use the > GenMRRedSink1()).process() to reinit the plan. > ++++++++++++++++++++++++++++ > if (uCtx.isMapOnlySubq()&&!upc.isIssetTaskPlan()) > ++++++++++++++++++++++++++++ > I don't know whether the code solution is suitable. > Is there any better solution? > thx -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira