[ 
https://issues.apache.org/jira/browse/IMPALA-11583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-11583.
-------------------------------------
    Fix Version/s: Impala 4.2.0
       Resolution: Fixed

> Use Iceberg APIs to update table properties for Iceberg tables
> --------------------------------------------------------------
>
>                 Key: IMPALA-11583
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11583
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg
>             Fix For: Impala 4.2.0
>
>
> COMPUTE STATS updates table-level stats via alter_table() HMS API. This 
> replaces the whole HMS table, therefore if there are concurrent modifications 
> by another engine, e.g. Hive, it's possible that these modifications are lost.
> This is critical for Iceberg tables, as the 'metadata_location' table 
> property must always point to the latest snapshot. Inadvertently rewriting it 
> during COMPUTE STATS can result in a data loss.
> Table-level stats like 'numRows' and 'totalSize' are already updated by 
> Iceberg during table modifications, i.e. there is no need to update these 
> values for COMPUTE STATS.
> Column stats are not affected as they are updated via a different API call 
> ([updateTableColumnStatistics|https://github.com/apache/impala/blob/4e813b7085c995a7244ef886b00c22e9d93cc80c/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1638()]),
>  and it doesn't touch the table properties. But updating statistics also 
> require us to update table property "impala.lastComputeStatsTime".  We should 
> update it via Iceberg APIs when HiveCatalog is used:
> https://github.com/apache/impala/blob/4e813b7085c995a7244ef886b00c22e9d93cc80c/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java#L211
> For other catalogs than HiveCatalog we still need to update the table 
> property via HMS API. It should be safe as other catalogs don't depend on HMS 
> table properties.
> Reloading the HMS table before invoking 'alter_table()' can be considered in 
> other cases (non-Iceberg tables as well), to decrease the possibility of 
> losing concurrent table updates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to