[ https://issues.apache.org/jira/browse/HIVE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916794#action_12916794 ]
Amareshwari Sriramadasu commented on HIVE-1678: ----------------------------------------------- The same query succeeds when no MapJoin is used. Looks like plan generation went wrong in MapJoinProcessor. explain output for the query: {noformat} explain select /*+MAPJOIN(src, myinput1) */ count(srcpart.key) from srcpart join src on (srcpart.value=src.value) join myinput1 on (srcpart.key=myinput1.key); OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF srcpart) (TOK_TABREF src) (= (. (TOK_TABLE_OR_COL srcpart) value) (. (TOK_TABLE_OR_COL src) value))) (TOK_TABREF myinput1) (= (. (TOK_TABLE_OR_COL srcpart) key) (. (TOK_TABLE_OR_COL myinput1) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_HINTLIST (TOK_HINT TOK_MAPJOIN (TOK_HINTARGLIST src myinput1))) (TOK_SELEXPR (TOK_FUNCTION count (. (TOK_TABLE_OR_COL srcpart) key)))))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1 Stage-3 depends on stages: Stage-2 Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: srcpart TableScan alias: srcpart Common Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} 1 handleSkewJoin: false keys: 0 [Column[value]] 1 [Column[value]] outputColumnNames: _col0 Position of Big Table: 0 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Local Work: Map Reduce Local Work Alias -> Map Local Tables: src Fetch Operator limit: -1 Alias -> Map Local Operator Tree: src TableScan alias: src Common Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} 1 handleSkewJoin: false keys: 0 [Column[value]] 1 [Column[value]] outputColumnNames: _col0 Position of Big Table: 0 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-10-01_11-24-15_198_629889728835692043/-mr-10002 Select Operator expressions: expr: _col0 type: int outputColumnNames: _col0 Common Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} 1 handleSkewJoin: false keys: 0 [Column[_col0]] 1 [Column[key]] outputColumnNames: _col0 Position of Big Table: 0 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Local Work: Map Reduce Local Work Alias -> Map Local Tables: myinput1 Fetch Operator limit: -1 Alias -> Map Local Operator Tree: myinput1 TableScan alias: myinput1 Common Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} 1 handleSkewJoin: false keys: 0 [Column[_col0]] 1 [Column[key]] outputColumnNames: _col0 Position of Big Table: 0 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-3 Map Reduce Alias -> Map Operator Tree: hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-10-01_11-24-15_198_629889728835692043/-mr-10002 Select Operator expressions: expr: _col0 type: int outputColumnNames: _col0 Common Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} 1 handleSkewJoin: false keys: 0 [Column[_col0]] 1 [Column[key]] outputColumnNames: _col0 Position of Big Table: 0 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Reduce Operator Tree: Group By Operator aggregations: expr: count(VALUE._col0) bucketGroup: false mode: mergepartial outputColumnNames: _col0 Select Operator expressions: expr: _col0 type: bigint outputColumnNames: _col0 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 Time taken: 0.202 seconds {noformat} If I'm not wrong, Join operation should not be there in 3rd stage. > NPE in MapJoin > --------------- > > Key: HIVE-1678 > URL: https://issues.apache.org/jira/browse/HIVE-1678 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor > Reporter: Amareshwari Sriramadasu > > The query with two map joins and a group by fails with following NPE: > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:177) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:464) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.