[
https://issues.apache.org/jira/browse/HIVE-9753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933207#comment-14933207
]
Feng Yuan commented on HIVE-9753:
---------------------------------
Gopal V/Sergey Shelukhin Can someone plz review this patch?
> Wrong results when using multiple levels of Joins. When table alias of one of
> the table is null with left outer joins.
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-9753
> URL: https://issues.apache.org/jira/browse/HIVE-9753
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.14.0, 1.0.0
> Reporter: Pavan Srinivas
> Priority: Critical
> Attachments: HIVE-9753.0-0.14.0.patch, HIVE-9753.0-1.0.0.patch,
> HIVE-9753.patch, table1.data, table2.data, table3.data
>
>
> Let take scenario, where the tables are:
> {code}
> drop table table1;
> CREATE TABLE table1(
> col1 string,
> col2 string,
> col3 string,
> col4 string
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
> drop table table2;
> CREATE TABLE table2(
> col1 string,
> col2 bigint,
> col3 string,
> col4 string
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
> drop table table3;
> CREATE TABLE table3(
> col1 string,
> col2 int,
> col3 int,
> col4 string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
> {code}
> Query with wrong results:
> {code}
> SELECT t1.col1 AS dummy,
> t1.expected_column AS expected_column,
> t2.col4
> FROM (
> SELECT col1,
> '23-11111',
> '23-13' as three,
> col4 AS expected_column
> FROM table1
> ) t1
> JOIN table2 t2
> ON cast(t2.col1 as string) = cast(t1.col1 as string)
> LEFT OUTER JOIN
> (SELECT col4, col1
> FROM table3
> ) t3
> ON t2.col4 = t3.col1
> ;
> {code}
> and explain output:
> {code}
> STAGE DEPENDENCIES:
> Stage-7 is a root stage
> Stage-5 depends on stages: Stage-7
> Stage-0 depends on stages: Stage-5
> STAGE PLANS:
> Stage: Stage-7
> Map Reduce Local Work
> Alias -> Map Local Tables:
> t1:table1
> Fetch Operator
> limit: -1
> t3:table3
> Fetch Operator
> limit: -1
> Alias -> Map Local Operator Tree:
> t1:table1
> TableScan
> alias: table1
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
> stats: NONE
> Filter Operator
> predicate: col1 is not null (type: boolean)
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
> stats: NONE
> Select Operator
> expressions: col1 (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
> stats: NONE
> HashTable Sink Operator
> condition expressions:
> 0
> 1 {col4}
> keys:
> 0 _col0 (type: string)
> 1 col1 (type: string)
> t3:table3
> TableScan
> alias: table3
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
> stats: NONE
> Select Operator
> expressions: col1 (type: string)
> outputColumnNames: _col1
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
> stats: NONE
> HashTable Sink Operator
> condition expressions:
> 0 {_col0} {_col7} {_col7}
> 1
> keys:
> 0 _col7 (type: string)
> 1 _col1 (type: string)
> Stage: Stage-5
> Map Reduce
> Map Operator Tree:
> TableScan
> alias: t2
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
> stats: NONE
> Filter Operator
> predicate: col1 is not null (type: boolean)
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
> stats: NONE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {_col0}
> 1 {col4}
> keys:
> 0 _col0 (type: string)
> 1 col1 (type: string)
> outputColumnNames: _col0, _col7
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
> stats: NONE
> Map Join Operator
> condition map:
> Left Outer Join0 to 1
> condition expressions:
> 0 {_col0} {_col7} {_col7}
> 1
> keys:
> 0 _col7 (type: string)
> 1 _col1 (type: string)
> outputColumnNames: _col0, _col3, _col7
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE
> Column stats: NONE
> Select Operator
> expressions: _col0 (type: string), _col3 (type: string),
> _col7 (type: string)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE
> Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE
> Column stats: NONE
> table:
> input format:
> org.apache.hadoop.mapred.TextInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Local Work:
> Map Reduce Local Work
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {code}
> After the patch, explain output(correct)
> {code}
> STAGE DEPENDENCIES:
> Stage-7 is a root stage
> Stage-5 depends on stages: Stage-7
> Stage-0 depends on stages: Stage-5
> STAGE PLANS:
> Stage: Stage-7
> Map Reduce Local Work
> Alias -> Map Local Tables:
> t1:table1
> Fetch Operator
> limit: -1
> t3:table3
> Fetch Operator
> limit: -1
> Alias -> Map Local Operator Tree:
> t1:table1
> TableScan
> alias: table1
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
> stats: NONE
> Filter Operator
> predicate: col1 is not null (type: boolean)
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
> stats: NONE
> Select Operator
> expressions: col1 (type: string), col4 (type: string)
> outputColumnNames: _col0, _col3
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
> stats: NONE
> HashTable Sink Operator
> condition expressions:
> 0 {_col3}
> 1 {col4}
> keys:
> 0 _col0 (type: string)
> 1 col1 (type: string)
> t3:table3
> TableScan
> alias: table3
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
> stats: NONE
> Select Operator
> expressions: col1 (type: string)
> outputColumnNames: _col1
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
> stats: NONE
> HashTable Sink Operator
> condition expressions:
> 0 {_col0} {_col3} {_col7}
> 1
> keys:
> 0 _col7 (type: string)
> 1 _col1 (type: string)
> Stage: Stage-5
> Map Reduce
> Map Operator Tree:
> TableScan
> alias: t2
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
> stats: NONE
> Filter Operator
> predicate: col1 is not null (type: boolean)
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
> stats: NONE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {_col0} {_col3}
> 1 {col4}
> keys:
> 0 _col0 (type: string)
> 1 col1 (type: string)
> outputColumnNames: _col0, _col3, _col7
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
> stats: NONE
> Map Join Operator
> condition map:
> Left Outer Join0 to 1
> condition expressions:
> 0 {_col0} {_col3} {_col7}
> 1
> keys:
> 0 _col7 (type: string)
> 1 _col1 (type: string)
> outputColumnNames: _col0, _col3, _col7
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE
> Column stats: NONE
> Select Operator
> expressions: _col0 (type: string), _col3 (type: string),
> _col7 (type: string)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE
> Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE
> Column stats: NONE
> table:
> input format:
> org.apache.hadoop.mapred.TextInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Local Work:
> Map Reduce Local Work
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {code}
> Notice the difference of "_col3" between the two explain outputs.
> Output of the query:(WRONG)
> {code}
> one bad_data_1 bad_data_1
> one bad_data_1 bad_data_1
> one bad_data_1 bad_data_1
> one bad_data_1 bad_data_1
> two bad_data_2 bad_data_2
> two bad_data_2 bad_data_2
> two bad_data_2 bad_data_2
> two bad_data_2 bad_data_2
> one bad_data_1 bad_data_1
> one bad_data_1 bad_data_1
> one bad_data_1 bad_data_1
> one bad_data_1 bad_data_1
> two bad_data_2 bad_data_2
> two bad_data_2 bad_data_2
> two bad_data_2 bad_data_2
> two bad_data_2 bad_data_2
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)