J. Tipan Verella created HIVE-7555:
--------------------------------------
Summary: inner join is being resolves as cartesian product
Key: HIVE-7555
URL: https://issues.apache.org/jira/browse/HIVE-7555
Project: Hive
Issue Type: Bug
Environment: CentOS
Reporter: J. Tipan Verella
I believe this is a bug, because I do not seem to be able to find a way around
the following stackoverflow question,
http://stackoverflow.com/questions/25020190/hive-query-returns-cartesian-product-instead-of-inner-join
The issue is as follows (repeated from SO for convenience).
This is type of query I am sending to HIVE:
SELECT BigTable.nicefield,LargeTable.*
FROM LargeTable INNER JOIN BigTable
ON (
LargeTable.joinfield1of4 = BigTable.joinfield1of4
AND LargeTable.joinfield2of4 = BigTable.joinfield2of4
)
WHERE LargeTable.joinfield3of4=20140726 AND LargeTable.joinfield4of4=15 AND
BigTable.joinfield3of4=20140726 AND BigTable.joinfield4of4=15
AND LargeTable.filterfiled1of2=123456
AND LargeTable.filterfiled2of2=98765
AND LargeTable.joinfield2of4=12
AND LargeTable.joinfield1of4='iwanttolikehive'
It returns `2418025` rows. The issue is that
SELECT *
FROM LargeTable
WHERE joinfield3of4=20140726 AND joinfield4of4=15
AND filterfiled1of2=123456
AND filterfiled2of2=98765
AND joinfield2of4=12
AND joinfield1of4='iwanttolikehive'
returns `1555` rows, and so does:
SELECT *
FROM BigTable
WHERE joinfield3of4=20140726 AND joinfield4of4=15
AND joinfield2of4=12
AND joinfield1of4='iwanttolikehive'
Note that **1555^2 = 2418025**.
Feel free to discard this issue if it is not a bug, but please provide a
solution on SO.
Thank you.
--
This message was sent by Atlassian JIRA
(v6.2#6252)