[
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexander Pivovarov updated HIVE-10841:
---------------------------------------
Summary: [WHERE col is not null] does not work sometimes for queries with
many JOIN statements (was: [WHERE col is not null] does not work for large
queries)
> [WHERE col is not null] does not work sometimes for queries with many JOIN
> statements
> -------------------------------------------------------------------------------------
>
> Key: HIVE-10841
> URL: https://issues.apache.org/jira/browse/HIVE-10841
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Alexander Pivovarov
>
> The result from the following SELECT query is 3 rows but it should be 1 row.
> I checked it in MySQL - it returned 1 row.
> To reproduce the issue in Hive
> 1. prepare tables
> {code}
> drop table if exists L;
> drop table if exists LA;
> drop table if exists FR;
> drop table if exists A;
> drop table if exists PI;
> drop table if exists acct;
> create table L as select 4436 id;
> create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
> create table FR as select 4436 loan_id;
> create table A as select 4748 id;
> create table PI as select 4415 id;
> create table acct as select 4748 aid, 10 acc_n, 122 brn;
> insert into table acct values(4748, null, null);
> insert into table acct values(4748, null, null);
> {code}
> 2. run SELECT query
> {code}
> select
> acct.ACC_N,
> acct.brn
> FROM L
> JOIN LA ON L.id = LA.loan_id
> JOIN FR ON L.id = FR.loan_id
> JOIN A ON LA.aid = A.id
> JOIN PI ON PI.id = LA.pi_id
> JOIN acct ON A.id = acct.aid
> WHERE
> L.id = 4436
> and acct.brn is not null;
> {code}
> the result is 3 rows
> {code}
> 10 122
> NULL NULL
> NULL NULL
> {code}
> but it should be 1 row
> {code}
> 10 122
> {code}
> 3. The workaround is to put "acct.brn is not null" to join condition
> {code}
> select
> acct.ACC_N,
> acct.brn
> FROM L
> JOIN LA ON L.id = LA.loan_id
> JOIN FR ON L.id = FR.loan_id
> JOIN A ON LA.aid = A.id
> JOIN PI ON PI.id = LA.pi_id
> JOIN acct ON A.id = acct.aid and acct.brn is not null
> WHERE
> L.id = 4436;
> OK
> 10 122
> Time taken: 23.479 seconds, Fetched: 1 row(s)
> {code}
> I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations
> have the issue
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)