[jira] [Updated] (HIVE-21452) Loss of query condition when exist exists

zengxl (JIRA) Wed, 27 Mar 2019 18:08:20 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-21452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


zengxl updated HIVE-21452:
--------------------------
    Affects Version/s: 3.1.0
                       2.3.4

> Loss of query condition when exist exists
> -----------------------------------------
>
>                 Key: HIVE-21452
>                 URL: https://issues.apache.org/jira/browse/HIVE-21452
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.1.0, 3.1.0, 2.3.4
>            Reporter: zengxl
>            Priority: Major
>
> In our production environment, there are four tables to do association 
> queries. There are exists in the conditions, and we found that the first two 
> conditions were lost.
> The following two conditions are missing：
> {color:#f79232}t2.cust_no is null and t4.cust_level not in ('4','5'){color}
>   
>  In the test environment, I prepared the data of the following four tables, 
> and a condition was lost in the simulation.
> tables:
> test_table1
>  cust_no,name
>  60001,lisa
>  60002,tina
>  60003,kylin
>  60004,jeny
>  60005,john
>  60006,jamse
> test_table2
>  cust_no,acct_type
>  60001,1
>  60001,1
>  60001,2
>  60002,1
>  60003,2
>  60003,3
> test_table3
>  cust_no
>  60001
>  60002
>  60003
>  60004
>  60005
>  60007
> test_table4
>  cust_no,cust_level
>  60001,1
>  60002,2
>  60003,3
>  60004,4
>  60005,5
>  
> create table tmp.test_table1(cust_no string,name string);
>  create table tmp.test_table2(cust_no string,acct_type string);
>  create table tmp.test_table3(cust_no string);
>  create table tmp.test_table4(cust_no string,cust_level string);
> insert into tmp.test_table1 select '60001','lisa';
>  insert into tmp.test_table1 select '60002','tina';
>  insert into tmp.test_table1 select '60003','kylin';
>  insert into tmp.test_table1 select '60004','jeny';
>  insert into tmp.test_table1 select '60005','john';
>  insert into tmp.test_table1 select '60006','jamse';
> insert into tmp.test_table2 select '60001','1';
>  insert into tmp.test_table2 select '60001','1';
>  insert into tmp.test_table2 select '60001','2';
>  insert into tmp.test_table2 select '60002','1';
>  insert into tmp.test_table2 select '60003','2';
>  insert into tmp.test_table2 select '60002','3';
> insert into tmp.test_table3 select '60001';
>  insert into tmp.test_table3 select '60002';
>  insert into tmp.test_table3 select '60003';
>  insert into tmp.test_table3 select '60004';
>  insert into tmp.test_table3 select '60005';
>  insert into tmp.test_table3 select '60007';
> insert into tmp.test_table4 select '60001','1';
>  insert into tmp.test_table4 select '60002','2';
>  insert into tmp.test_table4 select '60003','3';
>  insert into tmp.test_table4 select '60004','4';
>  insert into tmp.test_table4 select '60005','5';
>   
>  Here is my query SQL And shut down mapjoin:
> set hive.auto.convert.join=false;
> select t1.cust_no as cust_no,t2.cust_no as custNO,t1.name from 
> tmp.test_table1 t1
>  left join tmp.test_table2 t2 on t1.cust_no=t2.cust_no
>  and t2.acct_type='1'
>  left join tmp.test_table4 t4 on t1.cust_no=t4.cust_no
>  where t2.cust_no is null and t4.cust_level not in ('4','5') and exists 
> (select 1 from tmp.test_table3 t3 where t1.cust_no=t3.cust_no)
>  
> All I want is to include cust_no for 6003，But the result is inclusive 6004 
> and 6005，this wrong 。{color:#f79232}In my production environment, 6001 came 
> out。Loss of condition because cust_no is  null。{color}
> {color:#f6c342}View the execution plan, t4.cust_level not in ('4','5') 
> condition missing{color}
> *{color:#f6c342}Explain:{color}*
> STAGE DEPENDENCIES:
>  Stage-1 is a root stage
>  Stage-2 depends on stages: Stage-1
>  Stage-0 depends on stages: Stage-2
> STAGE PLANS:
>  Stage: Stage-1
>  Map Reduce
>  Map Operator Tree:
>  TableScan
>  alias: t1
>  Statistics: Num rows: 12 Data size: 2128 Basic stats: COMPLETE Column stats: 
> NONE
>  Filter Operator
>  predicate: cust_no is not null (type: boolean)
>  Statistics: Num rows: 6 Data size: 1064 Basic stats: COMPLETE Column stats: 
> NONE
>  Reduce Output Operator
>  key expressions: cust_no (type: string)
>  sort order: +
>  Map-reduce partition columns: cust_no (type: string)
>  Statistics: Num rows: 6 Data size: 1064 Basic stats: COMPLETE Column stats: 
> NONE
>  value expressions: name (type: string)
>  TableScan
>  alias: t2
>  Statistics: Num rows: 12 Data size: 2088 Basic stats: COMPLETE Column stats: 
> NONE
>  Filter Operator
>  predicate: ((acct_type = '1') and cust_no is not null) (type: boolean)
>  Statistics: Num rows: 3 Data size: 522 Basic stats: COMPLETE Column stats: 
> NONE
>  Reduce Output Operator
>  key expressions: cust_no (type: string)
>  sort order: +
>  Map-reduce partition columns: cust_no (type: string)
>  Statistics: Num rows: 3 Data size: 522 Basic stats: COMPLETE Column stats: 
> NONE
>  TableScan
>  alias: t4
>  Statistics: Num rows: 10 Data size: 1740 Basic stats: COMPLETE Column stats: 
> NONE
>  Filter Operator
>  predicate: cust_no is not null (type: boolean)
>  Statistics: Num rows: 5 Data size: 870 Basic stats: COMPLETE Column stats: 
> NONE
>  Reduce Output Operator
>  key expressions: cust_no (type: string)
>  sort order: +
>  Map-reduce partition columns: cust_no (type: string)
>  Statistics: Num rows: 5 Data size: 870 Basic stats: COMPLETE Column stats: 
> NONE
>  Reduce Operator Tree:
>  Join Operator
>  condition map:
>  Left Outer Join0 to 1
>  Left Outer Join0 to 2
>  keys:
>  0 cust_no (type: string)
>  1 cust_no (type: string)
>  2 cust_no (type: string)
>  outputColumnNames: _col0, _col1, _col5
>  Statistics: Num rows: 13 Data size: 2340 Basic stats: COMPLETE Column stats: 
> NONE
>  Filter Operator
>  predicate: _col5 is null (type: boolean)
>  Statistics: Num rows: 6 Data size: 1080 Basic stats: COMPLETE Column stats: 
> NONE
>  File Output Operator
>  compressed: false
>  table:
>  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>  serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
> Stage: Stage-2
>  Map Reduce
>  Map Operator Tree:
>  TableScan
>  Reduce Output Operator
>  key expressions: _col0 (type: string)
>  sort order: +
>  Map-reduce partition columns: _col0 (type: string)
>  Statistics: Num rows: 6 Data size: 1080 Basic stats: COMPLETE Column stats: 
> NONE
>  value expressions: _col1 (type: string)
>  TableScan
>  alias: t3
>  Statistics: Num rows: 12 Data size: 1068 Basic stats: COMPLETE Column stats: 
> NONE
>  Filter Operator
>  predicate: cust_no is not null (type: boolean)
>  Statistics: Num rows: 6 Data size: 534 Basic stats: COMPLETE Column stats: 
> NONE
>  Select Operator
>  expressions: cust_no (type: string)
>  outputColumnNames: _col1
>  Statistics: Num rows: 6 Data size: 534 Basic stats: COMPLETE Column stats: 
> NONE
>  Group By Operator
>  keys: _col1 (type: string)
>  mode: hash
>  outputColumnNames: _col0
>  Statistics: Num rows: 6 Data size: 534 Basic stats: COMPLETE Column stats: 
> NONE
>  Reduce Output Operator
>  key expressions: _col0 (type: string)
>  sort order: +
>  Map-reduce partition columns: _col0 (type: string)
>  Statistics: Num rows: 6 Data size: 534 Basic stats: COMPLETE Column stats: 
> NONE
>  Reduce Operator Tree:
>  Join Operator
>  condition map:
>  Left Semi Join 0 to 1
>  keys:
>  0 _col0 (type: string)
>  1 _col0 (type: string)
>  outputColumnNames: _col0, _col1
>  Statistics: Num rows: 6 Data size: 1188 Basic stats: COMPLETE Column stats: 
> NONE
>  Select Operator
>  expressions: _col0 (type: string), null (type: string), _col1 (type: string)
>  outputColumnNames: _col0, _col1, _col2
>  Statistics: Num rows: 6 Data size: 1188 Basic stats: COMPLETE Column stats: 
> NONE
>  File Output Operator
>  compressed: false
>  Statistics: Num rows: 6 Data size: 1188 Basic stats: COMPLETE Column stats: 
> NONE
>  table:
>  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
>  Fetch Operator
>  limit: -1
>  Processor Tree:
>  ListSink
>  
> {color:#f6c342}*result:*{color}
> 60003 NULL kylin
>  60003 NULL kylin
>  60003 NULL kylin
>  60004 NULL jeny
>  60005 NULL john
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21452) Loss of query condition when exist exists

Reply via email to