[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

Alexander Pivovarov (JIRA) Thu, 28 May 2015 22:50:45 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564243#comment-14564243
 ]


Alexander Pivovarov commented on HIVE-10841:
--------------------------------------------

I found that "JOIN FR" can be removed - the result still will be 3 rows
But adding or removing "JOIN PI" changes Filter Operator predicate for acct 
table

if we remove "JOIN PI" then acct table Filter Operator predicate has "brn is 
not null" and query returns 1 row
{code}
        acct 
          TableScan
            alias: acct
            Statistics: Num rows: 5 Data size: 63 Basic stats: COMPLETE Column 
stats: NONE
            Filter Operator
              predicate: (aid is not null and brn is not null) (type: boolean)
{code}

How it can be possible that removing "JOIN PI" changes Filter Operator 
predicate for acct table?

The query below returns 1 row. Query plan has "brn is not null" predicate in 
Filter Operator for acct table.
But if we remove comment before "JOIN PI" then query plan will not have "brn is 
not null" predicate.
{code}
explain select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN A ON LA.aid = A.id
--JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid
WHERE
  L.id = 4436
  and acct.brn is not null;
{code}


> [WHERE col is not null] does not work sometimes for queries with many JOIN 
> statements
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-10841
>                 URL: https://issues.apache.org/jira/browse/HIVE-10841
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning, Query Processor
>    Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
>            Reporter: Alexander Pivovarov
>
> The result from the following SELECT query is 3 rows but it should be 1 row.
> I checked it in MySQL - it returned 1 row.
> To reproduce the issue in Hive
> 1. prepare tables
> {code}
> drop table if exists L;
> drop table if exists LA;
> drop table if exists FR;
> drop table if exists A;
> drop table if exists PI;
> drop table if exists acct;
> create table L as select 4436 id;
> create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
> create table FR as select 4436 loan_id;
> create table A as select 4748 id;
> create table PI as select 4415 id;
> create table acct as select 4748 aid, 10 acc_n, 122 brn;
> insert into table acct values(4748, null, null);
> insert into table acct values(4748, null, null);
> {code}
> 2. run SELECT query
> {code}
> select
>   acct.ACC_N,
>   acct.brn
> FROM L
> JOIN LA ON L.id = LA.loan_id
> JOIN FR ON L.id = FR.loan_id
> JOIN A ON LA.aid = A.id
> JOIN PI ON PI.id = LA.pi_id
> JOIN acct ON A.id = acct.aid
> WHERE
>   L.id = 4436
>   and acct.brn is not null;
> {code}
> the result is 3 rows
> {code}
> 10    122
> NULL  NULL
> NULL  NULL
> {code}
> but it should be 1 row
> {code}
> 10    122
> {code}
> 2.1 "explain select ..." output for hive-1.3.0 MR
> {code}
> STAGE DEPENDENCIES:
>   Stage-12 is a root stage
>   Stage-9 depends on stages: Stage-12
>   Stage-0 depends on stages: Stage-9
> STAGE PLANS:
>   Stage: Stage-12
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         a 
>           Fetch Operator
>             limit: -1
>         acct 
>           Fetch Operator
>             limit: -1
>         fr 
>           Fetch Operator
>             limit: -1
>         l 
>           Fetch Operator
>             limit: -1
>         pi 
>           Fetch Operator
>             limit: -1
>       Alias -> Map Local Operator Tree:
>         a 
>           TableScan
>             alias: a
>             Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
>             Filter Operator
>               predicate: id is not null (type: boolean)
>               Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>               HashTable Sink Operator
>                 keys:
>                   0 _col5 (type: int)
>                   1 id (type: int)
>                   2 aid (type: int)
>         acct 
>           TableScan
>             alias: acct
>             Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
> Column stats: NONE
>             Filter Operator
>               predicate: aid is not null (type: boolean)
>               Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
> Column stats: NONE
>               HashTable Sink Operator
>                 keys:
>                   0 _col5 (type: int)
>                   1 id (type: int)
>                   2 aid (type: int)
>         fr 
>           TableScan
>             alias: fr
>             Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
>             Filter Operator
>               predicate: (loan_id = 4436) (type: boolean)
>               Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>               HashTable Sink Operator
>                 keys:
>                   0 4436 (type: int)
>                   1 4436 (type: int)
>                   2 4436 (type: int)
>         l 
>           TableScan
>             alias: l
>             Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
>             Filter Operator
>               predicate: (id = 4436) (type: boolean)
>               Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>               HashTable Sink Operator
>                 keys:
>                   0 4436 (type: int)
>                   1 4436 (type: int)
>                   2 4436 (type: int)
>         pi 
>           TableScan
>             alias: pi
>             Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
>             Filter Operator
>               predicate: id is not null (type: boolean)
>               Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>               HashTable Sink Operator
>                 keys:
>                   0 _col6 (type: int)
>                   1 id (type: int)
>   Stage: Stage-9
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: la
>             Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE 
> Column stats: NONE
>             Filter Operator
>               predicate: (((loan_id is not null and aid is not null) and 
> pi_id is not null) and (loan_id = 4436)) (type: boolean)
>               Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE 
> Column stats: NONE
>               Map Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                      Inner Join 0 to 2
>                 keys:
>                   0 4436 (type: int)
>                   1 4436 (type: int)
>                   2 4436 (type: int)
>                 outputColumnNames: _col5, _col6
>                 Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE 
> Column stats: NONE
>                 Map Join Operator
>                   condition map:
>                        Inner Join 0 to 1
>                        Inner Join 1 to 2
>                   keys:
>                     0 _col5 (type: int)
>                     1 id (type: int)
>                     2 aid (type: int)
>                   outputColumnNames: _col6, _col19, _col20
>                   Statistics: Num rows: 4 Data size: 17 Basic stats: COMPLETE 
> Column stats: NONE
>                   Map Join Operator
>                     condition map:
>                          Inner Join 0 to 1
>                     keys:
>                       0 _col6 (type: int)
>                       1 id (type: int)
>                     outputColumnNames: _col19, _col20
>                     Statistics: Num rows: 4 Data size: 18 Basic stats: 
> COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: _col19 (type: int), _col20 (type: int)
>                       outputColumnNames: _col0, _col1
>                       Statistics: Num rows: 4 Data size: 18 Basic stats: 
> COMPLETE Column stats: NONE
>                       File Output Operator
>                         compressed: false
>                         Statistics: Num rows: 4 Data size: 18 Basic stats: 
> COMPLETE Column stats: NONE
>                         table:
>                             input format: 
> org.apache.hadoop.mapred.TextInputFormat
>                             output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                             serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> Time taken: 0.57 seconds, Fetched: 142 row(s)
> {code}
> 2.2. "explain select..." output for hive-0.13.1 Tez
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE), Reducer 6 
> (SIMPLE_EDGE)
>         Reducer 3 <- Reducer 2 (SIMPLE_EDGE), Map 9 (SIMPLE_EDGE)
>         Reducer 6 <- Map 5 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE), Map 8 
> (SIMPLE_EDGE)
>       DagName: lcapp_20150528111717_06c57a5b-8dc6-4ce9-bce7-b9e0a7818fe4:1
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: acct
>                   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>                   Reduce Output Operator
>                     key expressions: aid (type: int)
>                     sort order: +
>                     Map-reduce partition columns: aid (type: int)
>                     Statistics: Num rows: 1 Data size: 4 Basic stats: 
> COMPLETE Column stats: NONE
>                     value expressions: acc_n (type: int), brn (type: int)
>         Map 4 
>             Map Operator Tree:
>                 TableScan
>                   alias: a
>                   Statistics: Num rows: 46 Data size: 187 Basic stats: 
> COMPLETE Column stats: NONE
>                   Reduce Output Operator
>                     key expressions: id (type: int)
>                     sort order: +
>                     Map-reduce partition columns: id (type: int)
>                     Statistics: Num rows: 46 Data size: 187 Basic stats: 
> COMPLETE Column stats: NONE
>         Map 5 
>             Map Operator Tree:
>                 TableScan
>                   alias: la
>                   Statistics: Num rows: 28 Data size: 347 Basic stats: 
> COMPLETE Column stats: NONE
>                   Filter Operator
>                     predicate: (loan_id = 4436) (type: boolean)
>                     Statistics: Num rows: 14 Data size: 173 Basic stats: 
> COMPLETE Column stats: NONE
>                     Reduce Output Operator
>                       key expressions: loan_id (type: int)
>                       sort order: +
>                       Map-reduce partition columns: loan_id (type: int)
>                       Statistics: Num rows: 14 Data size: 173 Basic stats: 
> COMPLETE Column stats: NONE
>                       value expressions: aid (type: int), pi_id (type: int)
>         Map 7 
>             Map Operator Tree:
>                 TableScan
>                   alias: fr
>                   Statistics: Num rows: 46 Data size: 187 Basic stats: 
> COMPLETE Column stats: NONE
>                   Filter Operator
>                     predicate: (loan_id = 4436) (type: boolean)
>                     Statistics: Num rows: 23 Data size: 93 Basic stats: 
> COMPLETE Column stats: NONE
>                     Reduce Output Operator
>                       key expressions: loan_id (type: int)
>                       sort order: +
>                       Map-reduce partition columns: loan_id (type: int)
>                       Statistics: Num rows: 23 Data size: 93 Basic stats: 
> COMPLETE Column stats: NONE
>         Map 8 
>             Map Operator Tree:
>                 TableScan
>                   alias: l
>                   Statistics: Num rows: 46 Data size: 187 Basic stats: 
> COMPLETE Column stats: NONE
>                   Filter Operator
>                     predicate: (id = 4436) (type: boolean)
>                     Statistics: Num rows: 23 Data size: 93 Basic stats: 
> COMPLETE Column stats: NONE
>                     Reduce Output Operator
>                       key expressions: id (type: int)
>                       sort order: +
>                       Map-reduce partition columns: id (type: int)
>                       Statistics: Num rows: 23 Data size: 93 Basic stats: 
> COMPLETE Column stats: NONE
>         Map 9 
>             Map Operator Tree:
>                 TableScan
>                   alias: pi
>                   Statistics: Num rows: 46 Data size: 187 Basic stats: 
> COMPLETE Column stats: NONE
>                   Reduce Output Operator
>                     key expressions: id (type: int)
>                     sort order: +
>                     Map-reduce partition columns: id (type: int)
>                     Statistics: Num rows: 46 Data size: 187 Basic stats: 
> COMPLETE Column stats: NONE
>         Reducer 2 
>             Reduce Operator Tree:
>               Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                      Inner Join 1 to 2
>                 condition expressions:
>                   0 {VALUE._col2}
>                   1 
>                   2 {VALUE._col1} {VALUE._col2}
>                 outputColumnNames: _col2, _col15, _col16
>                 Statistics: Num rows: 110 Data size: 448 Basic stats: 
> COMPLETE Column stats: NONE
>                 Reduce Output Operator
>                   key expressions: _col2 (type: int)
>                   sort order: +
>                   Map-reduce partition columns: _col2 (type: int)
>                   Statistics: Num rows: 110 Data size: 448 Basic stats: 
> COMPLETE Column stats: NONE
>                   value expressions: _col15 (type: int), _col16 (type: int)
>         Reducer 3 
>             Reduce Operator Tree:
>               Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 condition expressions:
>                   0 {VALUE._col1} {VALUE._col2}
>                   1 
>                 outputColumnNames: _col1, _col2
>                 Statistics: Num rows: 121 Data size: 492 Basic stats: 
> COMPLETE Column stats: NONE
>                 Select Operator
>                   expressions: _col1 (type: int), _col2 (type: int)
>                   outputColumnNames: _col0, _col1
>                   Statistics: Num rows: 121 Data size: 492 Basic stats: 
> COMPLETE Column stats: NONE
>                   File Output Operator
>                     compressed: false
>                     Statistics: Num rows: 121 Data size: 492 Basic stats: 
> COMPLETE Column stats: NONE
>                     table:
>                         input format: org.apache.hadoop.mapred.TextInputFormat
>                         output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                         serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>         Reducer 6 
>             Reduce Operator Tree:
>               Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                      Inner Join 0 to 2
>                 condition expressions:
>                   0 
>                   1 {VALUE._col1} {VALUE._col2}
>                   2 
>                 outputColumnNames: _col4, _col5
>                 Statistics: Num rows: 50 Data size: 204 Basic stats: COMPLETE 
> Column stats: NONE
>                 Reduce Output Operator
>                   key expressions: _col4 (type: int)
>                   sort order: +
>                   Map-reduce partition columns: _col4 (type: int)
>                   Statistics: Num rows: 50 Data size: 204 Basic stats: 
> COMPLETE Column stats: NONE
>                   value expressions: _col5 (type: int)
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
> Time taken: 1.377 seconds, Fetched: 146 row(s)
> {code}
> 3. The workaround is to put "acct.brn is not null" to join condition
> {code}
> select
>   acct.ACC_N,
>   acct.brn
> FROM L
> JOIN LA ON L.id = LA.loan_id
> JOIN FR ON L.id = FR.loan_id
> JOIN A ON LA.aid = A.id
> JOIN PI ON PI.id = LA.pi_id
> JOIN acct ON A.id = acct.aid and acct.brn is not null
> WHERE
>   L.id = 4436;
> OK
> 10    122
> Time taken: 23.479 seconds, Fetched: 1 row(s)
> {code}
> I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations 
> have the issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

Reply via email to