[
https://issues.apache.org/jira/browse/HIVE-29460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18061390#comment-18061390
]
Kokila N commented on HIVE-29460:
---------------------------------
Hi [~dkuzmenko]
*Hive vs Iceberg table with CLUSTERED BY*
*Hive table :*
CREATE EXTERNAL TABLE clustered_by (id string, key int, value string)
CLUSTERED BY (key) INTO 4 BUCKETS;
explain INSERT INTO clustered_by VALUES
('a', 1, 'val1'),
('b', 2, 'val2'),
('c', 3, 'val3');
{code:java}
Reduce Output Operator
key expressions: _col1 (type: int)
null sort order: a
sort order: +
Map-reduce partition columns: _col1 (type: int)
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats:
COMPLETE
value expressions: _col0 (type: string), _col2 (type: string){code}
*Iceberg table:*
CREATE EXTERNAL TABLE ice_clustered_by (id string, key int, value string)
CLUSTERED BY (key) INTO 4 BUCKETS
STORED BY ICEBERG;
explain INSERT INTO ice_clustered_by VALUES
('a', 1, 'val1'),
('b', 2, 'val2'),
('c', 3, 'val3');
{code:java}
No key expressions, no Map-reduce partition columns{code}
Iceberg tables with CLUSTERED BY should also follow the same plan behaviour as
Hive (partition-by-bucket ReduceSink and edges). Basically CLUSTERED BY clause
should be considered while creation/converting iceberg table.
Could you confirm if this is the expectation for this ticket ?
Let me know if I can work on this.
> Iceberg: Create/convert table disregards the `CLUSTERED BY` clause
> ------------------------------------------------------------------
>
> Key: HIVE-29460
> URL: https://issues.apache.org/jira/browse/HIVE-29460
> Project: Hive
> Issue Type: Bug
> Reporter: Denys Kuzmenko
> Priority: Major
>
> Clustering and sorting:
> ref:
> https://tabular.io/blog/whats-new-in-iceberg-0.13/#clustering-and-sorting-as-configuration
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)