[
https://issues.apache.org/jira/browse/HIVE-29290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18036224#comment-18036224
]
Krisztian Kasa commented on HIVE-29290:
---------------------------------------
CommonJoinOperator.java doesn't handle well Anti join in case of n way joins
https://github.com/apache/hive/blob/d186aba1923477949d4cede508566f4c3fb5cceb/ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java#L975-L1010
In case of n way join the flag {{noOuterJoin}} is {{false}}. See the else
branch of the if statement: it doesn't depend on the flag {{isRightOfAntiJoin}}
which indicates whether the current join input is a right side of an anti join.
I submitted a draft patch to try to improve the operator.
> Enabling "hive.merge.nway.joins" returns wrong resutls
> ------------------------------------------------------
>
> Key: HIVE-29290
> URL: https://issues.apache.org/jira/browse/HIVE-29290
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 4.1.0, 4.0.1
> Reporter: Chinna Rao Lalam
> Priority: Major
> Labels: correctness, pull-request-available
>
> Enabling CBO and "nway.joins" returns wrong results.
>
> set hive.merge.nway.joins=true;
> set hive.cbo.enable=true;
> create table taba(id string);
> INSERT INTO TABLE taba VALUES ('1'),('2');
> create table tabb(id string);
> INSERT INTO TABLE tabb VALUES ('1');
> create table tabc(id string);
> INSERT INTO TABLE tabc VALUES ('1'),('2'),('2');
> //Full data
> select * from taba A left outer join tabb B on (A.id = B.id) left outer join
> tabc C on (C.id = A.id);
> {code}
> +-----+-------+-------+
> | id | id_1 | id_2 |
> +-----+-------+-------+
> | 1 | 1 | 1 |
> | 2 | NULL | 2 |
> | 2 | NULL | 2 |
> +-----+-------+-------+
> {code}
>
> //ID is not null
> select * from taba A left outer join chinna.tabb B on (A.id = B.id) left
> outer join chinna.tabc C on (C.id = A.id) where B.id is not null;
> {code}
> +-----+-------+-------+
> | id | id_1 | id_2 |
> +-----+-------+-------+
> | 1 | 1 | 1 |
> +-----+-------+-------+
> {code}
>
> //ID is null
> select * from taba A left outer join tabb B on (A.id = B.id) left outer join
> tabc C on (C.id = A.id) where B.id is null;
> {code}
> +-----+-------+-------+
> | id | id_1 | id_2 |
> +-----+-------+-------+
> | 1 | NULL | 1 |
> | 2 | NULL | 2 |
> | 2 | NULL | 2 |
> +-----+-------+-------+
> {code}
> In this case B.id is not null for id=1, Correct output is
> {code}
> +-----+-------+-------+
> | id | id_1 | id_2 |
> +-----+-------+-------+
> | 2 | NULL | 2 |
> | 2 | NULL | 2 |
> +-----+-------+-------+
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)