[jira] [Commented] (HIVE-13693) Multi-insert query drops Filter before file output when there is a.val <> b.val

zhaolong (Jira) Wed, 21 Oct 2020 19:50:49 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-13693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218703#comment-17218703
 ]


zhaolong commented on HIVE-13693:
---------------------------------

[~jcamachorodriguez] hello  I found this patch may cause a semi join to left 
outer join?  NOT EXISTS query result when cbo disabled is not correct. 

the left is 3.1.0 branch,  the right is 1.2.1 branch, and query like select ... 
from a,b,c,d .. where a.xx=b.xx and c.a='' and c.b='' NOT EXIST(select 1 from d 
where d.xx='')。 the semi join are convert to left join, and some filter 
condition are missed

!image-2020-10-22-10-46-04-195.png!

> Multi-insert query drops Filter before file output when there is a.val <> 
> b.val
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-13693
>                 URL: https://issues.apache.org/jira/browse/HIVE-13693
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer
>    Affects Versions: 1.3.0, 2.0.0, 2.1.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>             Fix For: 2.1.0
>
>         Attachments: HIVE-13693.01.patch, HIVE-13693.01.patch, 
> HIVE-13693.02.patch, HIVE-13693.patch
>
>
> To reproduce:
> {noformat}
> CREATE TABLE T_A ( id STRING, val STRING ); 
> CREATE TABLE T_B ( id STRING, val STRING ); 
> CREATE TABLE join_result_1 ( ida STRING, vala STRING, idb STRING, valb STRING 
> ); 
> CREATE TABLE join_result_3 ( ida STRING, vala STRING, idb STRING, valb STRING 
> ); 
> INSERT INTO TABLE T_A 
> VALUES ('Id_1', 'val_101'), ('Id_2', 'val_102'), ('Id_3', 'val_103'); 
> INSERT INTO TABLE T_B 
> VALUES ('Id_1', 'val_103'), ('Id_2', 'val_104'); 
> explain
> FROM T_A a LEFT JOIN T_B b ON a.id = b.id
> INSERT OVERWRITE TABLE join_result_1
> SELECT a.*, b.*
> WHERE b.id = 'Id_1' AND b.val = 'val_103'
> INSERT OVERWRITE TABLE join_result_3
> SELECT a.*, b.*
> WHERE b.val = 'val_104' AND b.id = 'Id_2' AND a.val <> b.val;
> {noformat}
> The (wrong) plan is the following:
> {noformat}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-3 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-3
>   Stage-4 depends on stages: Stage-0
>   Stage-1 depends on stages: Stage-3
>   Stage-5 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
>     Tez
>       DagId: haha_20160504140944_174465c9-5d1a-42f9-9665-fae02eeb2767:2
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (SIMPLE_EDGE)
>       DagName: 
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: a
>                   Statistics: Num rows: 3 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>                   Reduce Output Operator
>                     key expressions: id (type: string)
>                     sort order: +
>                     Map-reduce partition columns: id (type: string)
>                     Statistics: Num rows: 3 Data size: 36 Basic stats: 
> COMPLETE Column stats: NONE
>                     value expressions: val (type: string)
>         Map 3 
>             Map Operator Tree:
>                 TableScan
>                   alias: b
>                   Statistics: Num rows: 2 Data size: 24 Basic stats: COMPLETE 
> Column stats: NONE
>                   Reduce Output Operator
>                     key expressions: id (type: string)
>                     sort order: +
>                     Map-reduce partition columns: id (type: string)
>                     Statistics: Num rows: 2 Data size: 24 Basic stats: 
> COMPLETE Column stats: NONE
>                     value expressions: val (type: string)
>         Reducer 2 
>             Reduce Operator Tree:
>               Merge Join Operator
>                 condition map:
>                      Left Outer Join0 to 1
>                 keys:
>                   0 id (type: string)
>                   1 id (type: string)
>                 outputColumnNames: _col0, _col1, _col6
>                 Statistics: Num rows: 3 Data size: 39 Basic stats: COMPLETE 
> Column stats: NONE
>                 Select Operator
>                   expressions: _col0 (type: string), _col1 (type: string), 
> 'Id_1' (type: string), 'val_103' (type: string)
>                   outputColumnNames: _col0, _col1, _col2, _col3
>                   Statistics: Num rows: 3 Data size: 39 Basic stats: COMPLETE 
> Column stats: NONE
>                   File Output Operator
>                     compressed: false
>                     Statistics: Num rows: 3 Data size: 39 Basic stats: 
> COMPLETE Column stats: NONE
>                     table:
>                         input format: org.apache.hadoop.mapred.TextInputFormat
>                         output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                         serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                         name: bugtest2.join_result_1
>                 Filter Operator
>                   predicate: (_col1 <> _col6) (type: boolean)
>                   Statistics: Num rows: 3 Data size: 39 Basic stats: COMPLETE 
> Column stats: NONE
>                   Select Operator
>                     expressions: _col0 (type: string), _col1 (type: string), 
> 'Id_2' (type: string), 'val_104' (type: string)
>                     outputColumnNames: _col0, _col1, _col2, _col3
>                     Statistics: Num rows: 3 Data size: 39 Basic stats: 
> COMPLETE Column stats: NONE
>                     File Output Operator
>                       compressed: false
>                       Statistics: Num rows: 3 Data size: 39 Basic stats: 
> COMPLETE Column stats: NONE
>                       table:
>                           input format: 
> org.apache.hadoop.mapred.TextInputFormat
>                           output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                           serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                           name: bugtest2.join_result_3
>   Stage: Stage-3
>     Dependency Collection
>   Stage: Stage-0
>     Move Operator
>       tables:
>           replace: true
>           table:
>               input format: org.apache.hadoop.mapred.TextInputFormat
>               output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>               serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>               name: bugtest2.join_result_1
>   Stage: Stage-4
>     Stats-Aggr Operator
>   Stage: Stage-1
>     Move Operator
>       tables:
>           replace: true
>           table:
>               input format: org.apache.hadoop.mapred.TextInputFormat
>               output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>               serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>               name: bugtest2.join_result_3
>   Stage: Stage-5
>     Stats-Aggr Operator
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-13693) Multi-insert query drops Filter before file output when there is a.val <> b.val

Reply via email to