Rahul Challapalli created DRILL-2046:
----------------------------------------
Summary: Merge join inconsistent results
Key: DRILL-2046
URL: https://issues.apache.org/jira/browse/DRILL-2046
Project: Apache Drill
Issue Type: Bug
Components: Execution - Operators
Reporter: Rahul Challapalli
Assignee: Aman Sinha
Priority: Critical
git.commit.id.abbrev=a418af1
The below queries should result in the same no of records. However the counts
do not match when we use merge join.
{code}
alter session set `planner.enable_hashjoin` = false;
select ws1.* from widestrings_small ws1 INNER JOIN widestrings_small ws2 on
ws1.str_fixed_null_empty=ws2.str_var_null_empty where ws1.str_fixed_null_empty
is not null and ws2.str_var_null_empty is not null and ws1.tinyint_var > 120;
6 records
select count(*) from widestrings_small ws1 INNER JOIN widestrings_small ws2 on
ws1.str_fixed_null_empty=ws2.str_var_null_empty where ws1.str_fixed_null_empty
is not null and ws2.str_var_null_empty is not null and ws1.tinyint_var > 120;
60 records
select count(ws1.str_var) from widestrings_small ws1 INNER JOIN
widestrings_small ws2 on ws1.str_fixed_null_empty=ws2.str_var_null_empty where
ws1.str_fixed_null_empty is not null and ws2.str_var_null_empty is not null and
ws1.tinyint_var > 120;
4 records
{code}
For hash join all the above queries result in 60 records. I attached the
dataset used. Let me know if you have any questions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)