[ 
https://issues.apache.org/jira/browse/HIVE-21322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Norvell updated HIVE-21322:
---------------------------------
    Description: 
Reproduction:

Create tables:
{code:java}
create table a (id string); insert into a values (1),(2),(3);

create table b (id string, name string); insert into b values 
(1,'a'),(2,'b'),(3,null);

create table c (id string); insert into c values (11),(22),(33);

{code}
When joining a -> b -> c on id, the following query is correct:
{code:java}
select a.id, b.name from a 
left outer join b on a.id = b.id 
left outer join c on a.id = c.id 
where b.name is not null;

OK
1    a
2    b
Time taken: 10.231 seconds, Fetched: 2 row(s)
{code}
Switching the join order from a -> c -> b results in incorrect results:
{code:java}
select a.id, b.name from a 
left outer join c on a.id = c.id 
left outer join b on a.id = b.id 
where b.name is not null;

OK
2    b
Time taken: 10.321 seconds, Fetched: 1 row(s)
{code}
Disabling hive.cbo.enable or changing execution engine to mr avoids the issue:
{code:java}
set hive.cbo.enable=false;
select a.id, b.name from a left outer join c on a.id = c.id left outer join b 
on a.id = b.id where b.name is not null;
OK
1    a
2    b
Time taken: 9.614 seconds, Fetched: 2 row(s)


set hive.cbo.enable=true;
set hive.execution.engine=mr;
select a.id, b.name from a left outer join c on a.id = c.id left outer join b 
on a.id = b.id where b.name is not null;
OK
1    a
2    b
Time taken: 29.377 seconds, Fetched: 2 row(s)
{code}
Issue doesn't occur when using 'is null':
{code:java}
select a.id, b.name from a left outer join c on a.id = c.id left outer join b 
on a.id = b.id where b.name is null;

OK
3    NULL
Time taken: 9.673 seconds, Fetched: 1 row(s)
{code}
Explain plans for queries attached.

  was:
Reproduction:

Create tables: 
{code:java}
create table a (id string); insert into a values (1),(2),(3);

create table b (id string, name string); insert into b values 
(1,'a'),(2,'b'),(3,null);

create table c (id string); insert into c values (11),(22),(33);

{code}
When joining a -> b -> c on id, the following query is correct: 
{code:java}
select a.id, b.name from a left outer join b on a.id = b.id left outer join c 
on a.id = c.id where b.name is not null;

OK
1    a
2    b
Time taken: 10.231 seconds, Fetched: 2 row(s)
{code}
Switching the join order from a -> c -> b results in incorrect results: 
{code:java}
select a.id, b.name from a 
left outer join c on a.id = c.id 
left outer join b on a.id = b.id 
where b.name is not null;

OK
2    b
Time taken: 10.321 seconds, Fetched: 1 row(s)
{code}
Disabling hive.cbo.enable or changing execution engine to mr avoids the issue: 
{code:java}
set hive.cbo.enable=false;
select a.id, b.name from a left outer join c on a.id = c.id left outer join b 
on a.id = b.id where b.name is not null;
OK
1    a
2    b
Time taken: 9.614 seconds, Fetched: 2 row(s)


set hive.cbo.enable=true;
set hive.execution.engine=mr;
select a.id, b.name from a left outer join c on a.id = c.id left outer join b 
on a.id = b.id where b.name is not null;
OK
1    a
2    b
Time taken: 29.377 seconds, Fetched: 2 row(s)
{code}
Issue doesn't occur when using 'is null':
{code:java}
select a.id, b.name from a left outer join c on a.id = c.id left outer join b 
on a.id = b.id where b.name is null;

OK
3    NULL
Time taken: 9.673 seconds, Fetched: 1 row(s)
{code}
Explain plans for queries attached.


> Multiple table LEFT OUTER JOIN results are incorrect when 'is not null' used 
> in WHERE clause.
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21322
>                 URL: https://issues.apache.org/jira/browse/HIVE-21322
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 2.3.4
>         Environment: Hive 2.3.4 (emr-5.21.0)
>            Reporter: James Norvell
>            Priority: Minor
>         Attachments: explain-plans.txt
>
>
> Reproduction:
> Create tables:
> {code:java}
> create table a (id string); insert into a values (1),(2),(3);
> create table b (id string, name string); insert into b values 
> (1,'a'),(2,'b'),(3,null);
> create table c (id string); insert into c values (11),(22),(33);
> {code}
> When joining a -> b -> c on id, the following query is correct:
> {code:java}
> select a.id, b.name from a 
> left outer join b on a.id = b.id 
> left outer join c on a.id = c.id 
> where b.name is not null;
> OK
> 1    a
> 2    b
> Time taken: 10.231 seconds, Fetched: 2 row(s)
> {code}
> Switching the join order from a -> c -> b results in incorrect results:
> {code:java}
> select a.id, b.name from a 
> left outer join c on a.id = c.id 
> left outer join b on a.id = b.id 
> where b.name is not null;
> OK
> 2    b
> Time taken: 10.321 seconds, Fetched: 1 row(s)
> {code}
> Disabling hive.cbo.enable or changing execution engine to mr avoids the issue:
> {code:java}
> set hive.cbo.enable=false;
> select a.id, b.name from a left outer join c on a.id = c.id left outer join b 
> on a.id = b.id where b.name is not null;
> OK
> 1    a
> 2    b
> Time taken: 9.614 seconds, Fetched: 2 row(s)
> set hive.cbo.enable=true;
> set hive.execution.engine=mr;
> select a.id, b.name from a left outer join c on a.id = c.id left outer join b 
> on a.id = b.id where b.name is not null;
> OK
> 1    a
> 2    b
> Time taken: 29.377 seconds, Fetched: 2 row(s)
> {code}
> Issue doesn't occur when using 'is null':
> {code:java}
> select a.id, b.name from a left outer join c on a.id = c.id left outer join b 
> on a.id = b.id where b.name is null;
> OK
> 3    NULL
> Time taken: 9.673 seconds, Fetched: 1 row(s)
> {code}
> Explain plans for queries attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to