[ https://issues.apache.org/jira/browse/HIVE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725783#comment-13725783 ]
Phabricator commented on HIVE-4968: ----------------------------------- ashutoshc has accepted the revision "HIVE-4968 [jira] When deduplicate multiple SelectOperators, we should update RowResolver accordinly". Looks good. Some minor comments. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java:399 You are not using this method. Lets not add this. ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java:104 Can you add a comment saying something like we need to set row resolver of parent from the child which is in parse context to preserve column mappings. Feel free to improve on the wording here. REVISION DETAIL https://reviews.facebook.net/D11901 BRANCH HIVE-4968 ARCANIST PROJECT hive To: JIRA, ashutoshc, yhuai > When deduplicating multiple SelectOperators, we should update RowResolver > accordinly > ------------------------------------------------------------------------------------ > > Key: HIVE-4968 > URL: https://issues.apache.org/jira/browse/HIVE-4968 > Project: Hive > Issue Type: Bug > Reporter: Yin Huai > Assignee: Yin Huai > Attachments: HIVE-4968.D11901.1.patch > > > {code:Sql} > SELECT tmp3.key, tmp3.value, tmp3.count > FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count > FROM (SELECT key, value > FROM src) tmp1 > JOIN (SELECT count(*) as count > FROM src) tmp2 > ) tmp3; > {\code} > The plan is executable. > {code:sql} > SELECT tmp3.key, tmp3.value, tmp3.count > FROM (SELECT tmp1.key as key, tmp1.value as value, tmp2.count as count > FROM (SELECT * > FROM src) tmp1 > JOIN (SELECT count(*) as count > FROM src) tmp2 > ) tmp3; > {\code} > The plan is executable. > {code} > SELECT tmp4.key, tmp4.value, tmp4.count > FROM (SELECT tmp2.key as key, tmp2.value as value, tmp3.count as count > FROM (SELECT * > FROM (SELECT key, value > FROM src) tmp1 ) tmp2 > JOIN (SELECT count(*) as count > FROM src) tmp3 > ) tmp4; > {\code} > The plan is not executable. > The plan related to the MapJoin is > {code} > Stage: Stage-5 > Map Reduce Local Work > Alias -> Map Local Tables: > tmp4:tmp2:tmp1:src > Fetch Operator > limit: -1 > Alias -> Map Local Operator Tree: > tmp4:tmp2:tmp1:src > TableScan > alias: src > Select Operator > expressions: > expr: key > type: string > expr: value > type: string > outputColumnNames: _col0, _col1 > HashTable Sink Operator > condition expressions: > 0 > 1 {_col0} > handleSkewJoin: false > keys: > 0 [] > 1 [] > Position of Big Table: 1 > Stage: Stage-4 > Map Reduce > Alias -> Map Operator Tree: > $INTNAME > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 > 1 {_col0} > handleSkewJoin: false > keys: > 0 [] > 1 [] > outputColumnNames: _col2 > Position of Big Table: 1 > Select Operator > expressions: > expr: _col0 > type: string > expr: _col1 > type: string > expr: _col2 > type: bigint > outputColumnNames: _col0, _col1, _col2 > File Output Operator > compressed: false > GlobalTableId: 0 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Local Work: > Map Reduce Local Work > {\code} > The outputColumnNames of MapJoin is '_col2'. But it should be '_col0, _col1, > _col2' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira