[
https://issues.apache.org/jira/browse/HIVE-27357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HIVE-27357:
----------------------------------
Labels: pull-request-available (was: )
> Map-side SMB Join returns incorrect result when it 2 tables have different
> bucket size
> --------------------------------------------------------------------------------------
>
> Key: HIVE-27357
> URL: https://issues.apache.org/jira/browse/HIVE-27357
> Project: Hive
> Issue Type: Bug
> Affects Versions: 3.1.3, 4.0.0-alpha-2
> Reporter: Seonggon Namgung
> Assignee: Seonggon Namgung
> Priority: Major
> Labels: pull-request-available
>
> The following query returns \{(1, 1), (2, 2), (7, 7)} instead of \{(1, 1),
> (2, 2), (7, 7), (6, 6), (14, 14), (11, 11)}.
>
>
> {code:java}
> set hive.strict.checks.bucketing=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.auto.convert.join.noconditionaltask.size=1;
> set hive.optimize.dynamic.partition.hashjoin=false;
> DROP TABLE IF EXISTS bucket2;
> CREATE TABLE bucket2(key string, value string) CLUSTERED BY (key) SORTED BY
> (key) INTO 2 BUCKETS;
> DROP TABLE IF EXISTS bucket3;
> CREATE TABLE bucket3(key string, value string) CLUSTERED BY (key) SORTED BY
> (key) INTO 3 BUCKETS;
> INSERT INTO TABLE bucket2 VALUES (1, 1), (2, 2), (7, 7), (6, 6), (14, 14),
> (11, 11);
> INSERT INTO TABLE bucket3 VALUES (1, 1), (2, 2), (7, 7), (6, 6), (14, 14),
> (11, 11);
> SELECT * FROM bucket2 JOIN bucket3 on bucket2.key = bucket3.key; {code}
>
>
> It is known that sort-merge join is used when two tables have the same number
> of buckets, but I could not find such restriction from the source code. Also,
> current Hive uses map side SMB join for the above query, which joins 2
> buckets table and 3 buckets table. So I'm planning to fix this issue not by
> using another Join algorithms.
> Originally, we found this issue by running auto_sortmerge_join_12.q with
> hive.strict.checks.bucketing=true.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)