[
https://issues.apache.org/jira/browse/HIVE-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Srini Pindi updated HIVE-11034:
-------------------------------
Attachment: (was: hive_issue.zip)
> Joining multiple tables producing different results with different order of
> join
> --------------------------------------------------------------------------------
>
> Key: HIVE-11034
> URL: https://issues.apache.org/jira/browse/HIVE-11034
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.13.0
> Environment: Linux 2.6.32-279.19.1.el6.x86_64
> Reporter: Srini Pindi
> Priority: Critical
>
> Join between one main table with other tables with different join columns
> returns wrong results in hive. Changing the order of the joins between main
> table and other tables is producing different results.
> Here are the steps to reproduce the issue:
> 1. Create tables as follows:
> create table p(ck string, email string);
> create table a1(ck string, flag string);
> create table a2(email string, flag string);
> create table a3(ck string, flag string);
> 2. Load data into the tables as follows:
> P
> ||ck||email||
> |10|e10|
> |20|e20|
> |30|e30|
> |40|e40|
>
> A1
> ||ck||flag||
> |10||N|
> |20||Y|
> |30||Y|
> |40||Y|
> A2
> ||email||flag||
> |e10||Y|
> |e20||N|
> |e30||Y|
> |e40||Y|
>
> A3
> ||ck||flag||
> |10||Y|
> |20||Y|
> |30||N|
> |40||Y|
>
> 3. Good query:
> {panel}
> select p.ck
> from p
> left outer join a1 on p.ck = a1.ck
> left outer join a3 on p.ck = a3.ck
> left outer join a2 on p.email = a2.email
> where a1.flag = 'Y'
> and a3.flag = 'Y'
> and a2.flag = 'Y'
> ;
> {panel}
> and results are
> 40
> 4. Bad query
> {panel}
> select p.ck
> from p
> left outer join a1 on p.ck = a1.ck
> left outer join a2 on p.email = a2.email
> left outer join a3 on p.ck = a3.ck
> where a1.flag = 'Y'
> and a2.flag = 'Y'
> and a3.flag = 'Y'
> ;
> {panel}
> Producing results as:
> 30
> 40
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)