[
https://issues.apache.org/jira/browse/IMPALA-11583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang resolved IMPALA-11583.
-------------------------------------
Fix Version/s: Impala 4.2.0
Resolution: Fixed
> Use Iceberg APIs to update table properties for Iceberg tables
> --------------------------------------------------------------
>
> Key: IMPALA-11583
> URL: https://issues.apache.org/jira/browse/IMPALA-11583
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Zoltán Borók-Nagy
> Assignee: Zoltán Borók-Nagy
> Priority: Major
> Labels: impala-iceberg
> Fix For: Impala 4.2.0
>
>
> COMPUTE STATS updates table-level stats via alter_table() HMS API. This
> replaces the whole HMS table, therefore if there are concurrent modifications
> by another engine, e.g. Hive, it's possible that these modifications are lost.
> This is critical for Iceberg tables, as the 'metadata_location' table
> property must always point to the latest snapshot. Inadvertently rewriting it
> during COMPUTE STATS can result in a data loss.
> Table-level stats like 'numRows' and 'totalSize' are already updated by
> Iceberg during table modifications, i.e. there is no need to update these
> values for COMPUTE STATS.
> Column stats are not affected as they are updated via a different API call
> ([updateTableColumnStatistics|https://github.com/apache/impala/blob/4e813b7085c995a7244ef886b00c22e9d93cc80c/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1638()]),
> and it doesn't touch the table properties. But updating statistics also
> require us to update table property "impala.lastComputeStatsTime". We should
> update it via Iceberg APIs when HiveCatalog is used:
> https://github.com/apache/impala/blob/4e813b7085c995a7244ef886b00c22e9d93cc80c/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java#L211
> For other catalogs than HiveCatalog we still need to update the table
> property via HMS API. It should be safe as other catalogs don't depend on HMS
> table properties.
> Reloading the HMS table before invoking 'alter_table()' can be considered in
> other cases (non-Iceberg tables as well), to decrease the possibility of
> losing concurrent table updates.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)