[
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexander Pivovarov updated HIVE-10841:
---------------------------------------
Description:
The result from the following SELECT query is 3 rows but it should be 1 row.
I checked it in MySQL - it returned 1 row.
To reproduce the issue in Hive
1. prepare tables
{code}
drop table if exists L;
drop table if exists LA;
drop table if exists FR;
drop table if exists A;
drop table if exists PI;
drop table if exists acct;
create table L as select 4436 id;
create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
create table FR as select 4436 loan_id;
create table A as select 4748 id;
create table PI as select 4415 id;
create table acct as select 4748 aid, 10 acc_n, 122 brn;
insert into table acct values(4748, null, null);
insert into table acct values(4748, null, null);
{code}
2. run SELECT query
{code}
select
acct.ACC_N,
acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid
WHERE
L.id = 4436
and acct.brn is not null;
{code}
the result is 3 rows
{code}
10 122
NULL NULL
NULL NULL
{code}
but it should be 1 row
{code}
10 122
{code}
3. The workaround is to put "acct.brn is not null" to join condition
{code}
select
acct.ACC_N,
acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid and acct.brn is not null
WHERE
L.id = 4436;
OK
10 122
{code}
I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations
have the issue
was:
The result from the following SELECT query is 3 rows but it should be 1 row.
I checked it in MySQL - it returned 1 row.
To reproduce the issue in Hive
1. prepare tables
{code}
drop table if exists L;
drop table if exists LA;
drop table if exists FR;
drop table if exists A;
drop table if exists PI;
drop table if exists acct;
create table L as select 4436 id;
create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
create table FR as select 4436 loan_id;
create table A as select 4748 id;
create table PI as select 4415 id;
create table acct as select 4748 aid, 10 acc_n, 122 brn;
insert into table acct values(4748, null, null);
insert into table acct values(4748, null, null);
{code}
2. run SELECT query
{code}
select
acct.ACC_N,
acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid
WHERE
L.id = 4436
and acct.brn is not null;
{code}
the result is 3 rows
{code}
10 122
NULL NULL
NULL NULL
{code}
but it should be 1 row
{code}
10 122
{code}
3. workaround is to put "acct.brn is not null" to join condition
{code}
select
acct.ACC_N,
acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid and acct.brn is not null
WHERE
L.id = 4436;
{code}
I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations
have the issue
> [WHERE col is not null] does not work for large queries
> -------------------------------------------------------
>
> Key: HIVE-10841
> URL: https://issues.apache.org/jira/browse/HIVE-10841
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Alexander Pivovarov
>
> The result from the following SELECT query is 3 rows but it should be 1 row.
> I checked it in MySQL - it returned 1 row.
> To reproduce the issue in Hive
> 1. prepare tables
> {code}
> drop table if exists L;
> drop table if exists LA;
> drop table if exists FR;
> drop table if exists A;
> drop table if exists PI;
> drop table if exists acct;
> create table L as select 4436 id;
> create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
> create table FR as select 4436 loan_id;
> create table A as select 4748 id;
> create table PI as select 4415 id;
> create table acct as select 4748 aid, 10 acc_n, 122 brn;
> insert into table acct values(4748, null, null);
> insert into table acct values(4748, null, null);
> {code}
> 2. run SELECT query
> {code}
> select
> acct.ACC_N,
> acct.brn
> FROM L
> JOIN LA ON L.id = LA.loan_id
> JOIN FR ON L.id = FR.loan_id
> JOIN A ON LA.aid = A.id
> JOIN PI ON PI.id = LA.pi_id
> JOIN acct ON A.id = acct.aid
> WHERE
> L.id = 4436
> and acct.brn is not null;
> {code}
> the result is 3 rows
> {code}
> 10 122
> NULL NULL
> NULL NULL
> {code}
> but it should be 1 row
> {code}
> 10 122
> {code}
> 3. The workaround is to put "acct.brn is not null" to join condition
> {code}
> select
> acct.ACC_N,
> acct.brn
> FROM L
> JOIN LA ON L.id = LA.loan_id
> JOIN FR ON L.id = FR.loan_id
> JOIN A ON LA.aid = A.id
> JOIN PI ON PI.id = LA.pi_id
> JOIN acct ON A.id = acct.aid and acct.brn is not null
> WHERE
> L.id = 4436;
> OK
> 10 122
> {code}
> I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations
> have the issue
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)