Shohei Okumiya created HIVE-28798:
-------------------------------------

             Summary: BucketMapJoin using partial partition transforms
                 Key: HIVE-28798
                 URL: https://issues.apache.org/jira/browse/HIVE-28798
             Project: Hive
          Issue Type: Improvement
          Components: Iceberg integration
            Reporter: Shohei Okumiya
            Assignee: Shohei Okumiya


The current implementation requires all bucket transforms to be projected. 
Unlike Hive's native bucketing, Iceberg allows multiple bucket keys to be 
decomposed into multiple partition transforms. For example,
{code:java}
CREATE TABLE srcbucket_big(key1 int, key2 string, value string, id int)
PARTITIONED BY SPEC(bucket(4, key1), bucket(8, key2)) STORED BY ICEBERG; {code}
Currently, BMJ is applied when both key1 and key2 are used.
{code:java}
SELECT a.key1, a.key2, a.id
FROM srcbucket_big a
JOIN src_small b ON a.key1 = b.key1 AND a.key2 = b.key2
ORDER BY a.id; {code}
Considering the storage layout of Apache Iceberg, the following query can also 
leverage BMJ.
{code:java}
SELECT a.key1, a.id
FROM srcbucket_big a
JOIN src_small b ON a.key1 = b.key1
ORDER BY a.id; {code}
This optimization would be helpful when 
[HIVE-28414|https://issues.apache.org/jira/browse/HIVE-28414] extended the 
optimization to non-bucket transforms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to