[
https://issues.apache.org/jira/browse/HIVE-28120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Denys Kuzmenko updated HIVE-28120:
----------------------------------
Priority: Major (was: Critical)
> When insert overwrite the iceberg table, data will loss if the sql contains
> union all
> -------------------------------------------------------------------------------------
>
> Key: HIVE-28120
> URL: https://issues.apache.org/jira/browse/HIVE-28120
> Project: Hive
> Issue Type: Bug
> Components: Iceberg integration
> Affects Versions: 4.0.0-beta-1
> Environment: hadoop version: 3.3.1
> hive version: 4.0.0-beta-1
> iceberg version: 1.3.0
> Reporter: xinmingchang
> Priority: Major
>
> {{(1)}}
> create table tmp.test_iceberg_overwrite_union_all(
> a string
> )
> stored by iceberg
> ;
> {{(2)}}
> insert overwrite table tmp.test_iceberg_overwrite_union_all
> select distinct 'a' union all select distinct 'b';
> {{(3)}}
> select * from tmp.test_iceberg_overwrite_union_all;
>
> the result only has one record:
> +-------------------------------------+
> | test_iceberg_overwrite_union_all.a |
> +-------------------------------------+
> | a |
> +-------------------------------------+
> According to the hiveserver log, this query will start two jobs, and each job
> will be committed. The problem is that the job that is committed later is
> also an overwrite, causing the result of the first commit to be overwritten.
> like this:
> 2024-03-05T22:10:12,995 INFO [iceberg-commit-table-pool-0]:
> hive.HiveIcebergOutputCommitter () - Committing job has started for table:
> default_iceberg.tmp.test_iceberg_overwrite_union_all
> 2024-03-05T22:10:13,081 INFO [iceberg-commit-table-pool-1]:
> hive.HiveIcebergOutputCommitter () - Committing job has started for table:
> default_iceberg.tmp.test_iceberg_overwrite_union_all
> 2024-03-05T22:10:15,152 INFO [iceberg-commit-table-pool-0]:
> hive.HiveIcebergOutputCommitter () - Overwrite commit took 2157 ms for table:
> default_iceberg.tmp.test_iceberg_overwrite_union_all with 1 file(s)
> 2024-03-05T22:10:16,980 INFO [iceberg-commit-table-pool-1]:
> hive.HiveIcebergOutputCommitter () - Overwrite commit took 3899 ms for table:
> default_iceberg.tmp.test_iceberg_overwrite_union_all with 1 file(s)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)