[
https://issues.apache.org/jira/browse/HIVE-26133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ádám Szita updated HIVE-26133:
------------------------------
Fix Version/s: 4.0.0-alpha-2
> Insert overwrite on Iceberg tables can result in duplicate entries after
> partition evolution
> --------------------------------------------------------------------------------------------
>
> Key: HIVE-26133
> URL: https://issues.apache.org/jira/browse/HIVE-26133
> Project: Hive
> Issue Type: Improvement
> Components: Iceberg integration
> Reporter: László Pintér
> Assignee: László Pintér
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> Insert overwrite commands in Hive only rewrite partitions affected by the
> query.
> If we write out a record with specA (e.g. day(ts)), resulting in a datafile:
> "/tableRoot/data/ts_day="2020-10-24"/ffffgggg.orc
> If you then change to specB (e.g. day(ts), name), the same record would go to
> a different partition:
> "/tableRoot/data/ts_day="2020-10-24"/name="Mike"/ffffgggg.orc
> If you then want to overwrite the table with itself, it will detect these two
> records to belong to different partitions (as they do), and therefore does
> not overwrite the original record with the new one, resulting in duplicate
> entries.
> {code:java}
> create table testice1000 (a int, b string) stored by iceberg stored as orc
> location 'file:/tmp/testice1000';
> insert into testice1000 values (11, 'ddd'), (22, 'ttt');
> alter table testice1000 set partition spec(truncate(2, b));
> insert into testice1000 values (33, 'rrfdfdf');
> insert overwrite table testice1000 select * from testice1000;
> ------------------------------+
> testice1000.a testice1000.b
> ------------------------------+
> 11 ddd
> 11 ddd
> 22 ttt
> 22 ttt
> 33 rrfdfdf
> ------------------------------+
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)