[jira] [Updated] (HIVE-9753) Wrong results when using multiple levels of Joins. When table alias of one of the table is null with left outer joins.

Pavan Srinivas (JIRA) Mon, 23 Feb 2015 11:29:25 -0800

     [ 
https://issues.apache.org/jira/browse/HIVE-9753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Pavan Srinivas updated HIVE-9753:
---------------------------------
    Attachment: HIVE-9753.patch
                HIVE-9753.0-0.14.0.patch
                HIVE-9753.0-1.0.0.patch

> Wrong results when using multiple levels of Joins. When table alias of one of 
> the table is null with left outer joins.  
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-9753
>                 URL: https://issues.apache.org/jira/browse/HIVE-9753
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0, 1.0.0
>            Reporter: Pavan Srinivas
>            Priority: Critical
>         Attachments: HIVE-9753.0-0.14.0.patch, HIVE-9753.0-1.0.0.patch, 
> HIVE-9753.patch, table1.data, table2.data, table3.data
>
>
> Let take scenario, where the tables are:
> {code}
> drop table table1;
> CREATE TABLE table1(
>   col1 string,
>   col2 string,
>   col3 string,
>   col4 string
>   )
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\t'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
> drop table table2;
> CREATE  TABLE table2(
>   col1 string,
>   col2 bigint,
>   col3 string,
>   col4 string
>   )
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\t'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
> drop table table3;
> CREATE  TABLE table3(
>   col1 string,
>   col2 int,
>   col3 int,
>   col4 string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\t'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
> {code}
> Query with wrong results:
> {code}
> SELECT t1.col1 AS dummy,
>     t1.expected_column AS expected_column,
>     t2.col4
> FROM (
>     SELECT col1,
>         '23-11111',
>         '23-13' as three,
>         col4 AS expected_column
>     FROM table1
>     ) t1
> JOIN table2 t2
>     ON cast(t2.col1 as string) = cast(t1.col1 as string)
> LEFT OUTER JOIN
>     (SELECT col4, col1
>     FROM table3
>     ) t3
>     ON t2.col4 = t3.col1  
> ;
> {code}
> and explain output: 
> {code}
> STAGE DEPENDENCIES:
>   Stage-7 is a root stage
>   Stage-5 depends on stages: Stage-7
>   Stage-0 depends on stages: Stage-5
> STAGE PLANS:
>   Stage: Stage-7
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         t1:table1
>           Fetch Operator
>             limit: -1
>         t3:table3
>           Fetch Operator
>             limit: -1
>       Alias -> Map Local Operator Tree:
>         t1:table1
>           TableScan
>             alias: table1
>             Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>             Filter Operator
>               predicate: col1 is not null (type: boolean)
>               Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>               Select Operator
>                 expressions: col1 (type: string)
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>                 HashTable Sink Operator
>                   condition expressions:
>                     0
>                     1 {col4}
>                   keys:
>                     0 _col0 (type: string)
>                     1 col1 (type: string)
>         t3:table3
>           TableScan
>             alias: table3
>             Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>             Select Operator
>               expressions: col1 (type: string)
>               outputColumnNames: _col1
>               Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>               HashTable Sink Operator
>                 condition expressions:
>                   0 {_col0} {_col7} {_col7}
>                   1
>                 keys:
>                   0 _col7 (type: string)
>                   1 _col1 (type: string)
>   Stage: Stage-5
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: t2
>             Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>             Filter Operator
>               predicate: col1 is not null (type: boolean)
>               Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>               Map Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 condition expressions:
>                   0 {_col0}
>                   1 {col4}
>                 keys:
>                   0 _col0 (type: string)
>                   1 col1 (type: string)
>                 outputColumnNames: _col0, _col7
>                 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>                 Map Join Operator
>                   condition map:
>                        Left Outer Join0 to 1
>                   condition expressions:
>                     0 {_col0} {_col7} {_col7}
>                     1
>                   keys:
>                     0 _col7 (type: string)
>                     1 _col1 (type: string)
>                   outputColumnNames: _col0, _col3, _col7
>                   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE
>                   Select Operator
>                     expressions: _col0 (type: string), _col3 (type: string), 
> _col7 (type: string)
>                     outputColumnNames: _col0, _col1, _col2
>                     Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE
>                     File Output Operator
>                       compressed: false
>                       Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE
>                       table:
>                           input format: 
> org.apache.hadoop.mapred.TextInputFormat
>                           output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                           serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}
> After the patch, explain output(correct)
> {code}
> STAGE DEPENDENCIES:
>   Stage-7 is a root stage
>   Stage-5 depends on stages: Stage-7
>   Stage-0 depends on stages: Stage-5
> STAGE PLANS:
>   Stage: Stage-7
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         t1:table1
>           Fetch Operator
>             limit: -1
>         t3:table3
>           Fetch Operator
>             limit: -1
>       Alias -> Map Local Operator Tree:
>         t1:table1
>           TableScan
>             alias: table1
>             Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>             Filter Operator
>               predicate: col1 is not null (type: boolean)
>               Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>               Select Operator
>                 expressions: col1 (type: string), col4 (type: string)
>                 outputColumnNames: _col0, _col3
>                 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>                 HashTable Sink Operator
>                   condition expressions:
>                     0 {_col3}
>                     1 {col4}
>                   keys:
>                     0 _col0 (type: string)
>                     1 col1 (type: string)
>         t3:table3
>           TableScan
>             alias: table3
>             Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>             Select Operator
>               expressions: col1 (type: string)
>               outputColumnNames: _col1
>               Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>               HashTable Sink Operator
>                 condition expressions:
>                   0 {_col0} {_col3} {_col7}
>                   1
>                 keys:
>                   0 _col7 (type: string)
>                   1 _col1 (type: string)
>   Stage: Stage-5
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: t2
>             Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>             Filter Operator
>               predicate: col1 is not null (type: boolean)
>               Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>               Map Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 condition expressions:
>                   0 {_col0} {_col3}
>                   1 {col4}
>                 keys:
>                   0 _col0 (type: string)
>                   1 col1 (type: string)
>                 outputColumnNames: _col0, _col3, _col7
>                 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>                 Map Join Operator
>                   condition map:
>                        Left Outer Join0 to 1
>                   condition expressions:
>                     0 {_col0} {_col3} {_col7}
>                     1
>                   keys:
>                     0 _col7 (type: string)
>                     1 _col1 (type: string)
>                   outputColumnNames: _col0, _col3, _col7
>                   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE
>                   Select Operator
>                     expressions: _col0 (type: string), _col3 (type: string), 
> _col7 (type: string)
>                     outputColumnNames: _col0, _col1, _col2
>                     Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE
>                     File Output Operator
>                       compressed: false
>                       Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE
>                       table:
>                           input format: 
> org.apache.hadoop.mapred.TextInputFormat
>                           output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                           serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}
> Notice the difference of "_col3" between the two explain outputs. 
> Output of the query:(WRONG)
> {code}
> one   bad_data_1      bad_data_1
> one   bad_data_1      bad_data_1
> one   bad_data_1      bad_data_1
> one   bad_data_1      bad_data_1
> two   bad_data_2      bad_data_2
> two   bad_data_2      bad_data_2
> two   bad_data_2      bad_data_2
> two   bad_data_2      bad_data_2
> one   bad_data_1      bad_data_1
> one   bad_data_1      bad_data_1
> one   bad_data_1      bad_data_1
> one   bad_data_1      bad_data_1
> two   bad_data_2      bad_data_2
> two   bad_data_2      bad_data_2
> two   bad_data_2      bad_data_2
> two   bad_data_2      bad_data_2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9753) Wrong results when using multiple levels of Joins. When table alias of one of the table is null with left outer joins.

Reply via email to