[
https://issues.apache.org/jira/browse/HIVE-21452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zengxl updated HIVE-21452:
--------------------------
Affects Version/s: 3.1.0
2.3.4
> Loss of query condition when exist exists
> -----------------------------------------
>
> Key: HIVE-21452
> URL: https://issues.apache.org/jira/browse/HIVE-21452
> Project: Hive
> Issue Type: Bug
> Affects Versions: 1.1.0, 3.1.0, 2.3.4
> Reporter: zengxl
> Priority: Major
>
> In our production environment, there are four tables to do association
> queries. There are exists in the conditions, and we found that the first two
> conditions were lost.
> The following two conditions are missing:
> {color:#f79232}t2.cust_no is null and t4.cust_level not in ('4','5'){color}
>
> In the test environment, I prepared the data of the following four tables,
> and a condition was lost in the simulation.
> tables:
> test_table1
> cust_no,name
> 60001,lisa
> 60002,tina
> 60003,kylin
> 60004,jeny
> 60005,john
> 60006,jamse
> test_table2
> cust_no,acct_type
> 60001,1
> 60001,1
> 60001,2
> 60002,1
> 60003,2
> 60003,3
> test_table3
> cust_no
> 60001
> 60002
> 60003
> 60004
> 60005
> 60007
> test_table4
> cust_no,cust_level
> 60001,1
> 60002,2
> 60003,3
> 60004,4
> 60005,5
>
> create table tmp.test_table1(cust_no string,name string);
> create table tmp.test_table2(cust_no string,acct_type string);
> create table tmp.test_table3(cust_no string);
> create table tmp.test_table4(cust_no string,cust_level string);
> insert into tmp.test_table1 select '60001','lisa';
> insert into tmp.test_table1 select '60002','tina';
> insert into tmp.test_table1 select '60003','kylin';
> insert into tmp.test_table1 select '60004','jeny';
> insert into tmp.test_table1 select '60005','john';
> insert into tmp.test_table1 select '60006','jamse';
> insert into tmp.test_table2 select '60001','1';
> insert into tmp.test_table2 select '60001','1';
> insert into tmp.test_table2 select '60001','2';
> insert into tmp.test_table2 select '60002','1';
> insert into tmp.test_table2 select '60003','2';
> insert into tmp.test_table2 select '60002','3';
> insert into tmp.test_table3 select '60001';
> insert into tmp.test_table3 select '60002';
> insert into tmp.test_table3 select '60003';
> insert into tmp.test_table3 select '60004';
> insert into tmp.test_table3 select '60005';
> insert into tmp.test_table3 select '60007';
> insert into tmp.test_table4 select '60001','1';
> insert into tmp.test_table4 select '60002','2';
> insert into tmp.test_table4 select '60003','3';
> insert into tmp.test_table4 select '60004','4';
> insert into tmp.test_table4 select '60005','5';
>
> Here is my query SQL And shut down mapjoin:
> set hive.auto.convert.join=false;
> select t1.cust_no as cust_no,t2.cust_no as custNO,t1.name from
> tmp.test_table1 t1
> left join tmp.test_table2 t2 on t1.cust_no=t2.cust_no
> and t2.acct_type='1'
> left join tmp.test_table4 t4 on t1.cust_no=t4.cust_no
> where t2.cust_no is null and t4.cust_level not in ('4','5') and exists
> (select 1 from tmp.test_table3 t3 where t1.cust_no=t3.cust_no)
>
> All I want is to include cust_no for 6003,But the result is inclusive 6004
> and 6005,this wrong 。{color:#f79232}In my production environment, 6001 came
> out。Loss of condition because cust_no is null。{color}
> {color:#f6c342}View the execution plan, t4.cust_level not in ('4','5')
> condition missing{color}
> *{color:#f6c342}Explain:{color}*
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-2 depends on stages: Stage-1
> Stage-0 depends on stages: Stage-2
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Map Operator Tree:
> TableScan
> alias: t1
> Statistics: Num rows: 12 Data size: 2128 Basic stats: COMPLETE Column stats:
> NONE
> Filter Operator
> predicate: cust_no is not null (type: boolean)
> Statistics: Num rows: 6 Data size: 1064 Basic stats: COMPLETE Column stats:
> NONE
> Reduce Output Operator
> key expressions: cust_no (type: string)
> sort order: +
> Map-reduce partition columns: cust_no (type: string)
> Statistics: Num rows: 6 Data size: 1064 Basic stats: COMPLETE Column stats:
> NONE
> value expressions: name (type: string)
> TableScan
> alias: t2
> Statistics: Num rows: 12 Data size: 2088 Basic stats: COMPLETE Column stats:
> NONE
> Filter Operator
> predicate: ((acct_type = '1') and cust_no is not null) (type: boolean)
> Statistics: Num rows: 3 Data size: 522 Basic stats: COMPLETE Column stats:
> NONE
> Reduce Output Operator
> key expressions: cust_no (type: string)
> sort order: +
> Map-reduce partition columns: cust_no (type: string)
> Statistics: Num rows: 3 Data size: 522 Basic stats: COMPLETE Column stats:
> NONE
> TableScan
> alias: t4
> Statistics: Num rows: 10 Data size: 1740 Basic stats: COMPLETE Column stats:
> NONE
> Filter Operator
> predicate: cust_no is not null (type: boolean)
> Statistics: Num rows: 5 Data size: 870 Basic stats: COMPLETE Column stats:
> NONE
> Reduce Output Operator
> key expressions: cust_no (type: string)
> sort order: +
> Map-reduce partition columns: cust_no (type: string)
> Statistics: Num rows: 5 Data size: 870 Basic stats: COMPLETE Column stats:
> NONE
> Reduce Operator Tree:
> Join Operator
> condition map:
> Left Outer Join0 to 1
> Left Outer Join0 to 2
> keys:
> 0 cust_no (type: string)
> 1 cust_no (type: string)
> 2 cust_no (type: string)
> outputColumnNames: _col0, _col1, _col5
> Statistics: Num rows: 13 Data size: 2340 Basic stats: COMPLETE Column stats:
> NONE
> Filter Operator
> predicate: _col5 is null (type: boolean)
> Statistics: Num rows: 6 Data size: 1080 Basic stats: COMPLETE Column stats:
> NONE
> File Output Operator
> compressed: false
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
> Stage: Stage-2
> Map Reduce
> Map Operator Tree:
> TableScan
> Reduce Output Operator
> key expressions: _col0 (type: string)
> sort order: +
> Map-reduce partition columns: _col0 (type: string)
> Statistics: Num rows: 6 Data size: 1080 Basic stats: COMPLETE Column stats:
> NONE
> value expressions: _col1 (type: string)
> TableScan
> alias: t3
> Statistics: Num rows: 12 Data size: 1068 Basic stats: COMPLETE Column stats:
> NONE
> Filter Operator
> predicate: cust_no is not null (type: boolean)
> Statistics: Num rows: 6 Data size: 534 Basic stats: COMPLETE Column stats:
> NONE
> Select Operator
> expressions: cust_no (type: string)
> outputColumnNames: _col1
> Statistics: Num rows: 6 Data size: 534 Basic stats: COMPLETE Column stats:
> NONE
> Group By Operator
> keys: _col1 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 6 Data size: 534 Basic stats: COMPLETE Column stats:
> NONE
> Reduce Output Operator
> key expressions: _col0 (type: string)
> sort order: +
> Map-reduce partition columns: _col0 (type: string)
> Statistics: Num rows: 6 Data size: 534 Basic stats: COMPLETE Column stats:
> NONE
> Reduce Operator Tree:
> Join Operator
> condition map:
> Left Semi Join 0 to 1
> keys:
> 0 _col0 (type: string)
> 1 _col0 (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 1188 Basic stats: COMPLETE Column stats:
> NONE
> Select Operator
> expressions: _col0 (type: string), null (type: string), _col1 (type: string)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 6 Data size: 1188 Basic stats: COMPLETE Column stats:
> NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 6 Data size: 1188 Basic stats: COMPLETE Column stats:
> NONE
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
>
> {color:#f6c342}*result:*{color}
> 60003 NULL kylin
> 60003 NULL kylin
> 60003 NULL kylin
> 60004 NULL jeny
> 60005 NULL john
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)