Saulius Valatka created IMPALA-13499:
----------------------------------------
Summary: REFRESH on Iceberg tables can lead to data loss
Key: IMPALA-13499
URL: https://issues.apache.org/jira/browse/IMPALA-13499
Project: IMPALA
Issue Type: Bug
Components: Catalog
Affects Versions: Impala 4.4.1
Reporter: Saulius Valatka
When running a REFRESH statement on an Iceberg table the catalog loads it from
the Hive metastore and later performs an {{alter_table}}
[here|https://github.com/apache/impala/blob/bdce7778b239f6fbf8ea89ea32b91a83c8017828/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L445].
It does so without taking a Hive lock, meaning that if any external process
commits to the table between load and alter, the newly committed
"metadata_location" property will be overwritten with the previous value and
effectively will result in data loss.
It should either take a Hive lock when doing this, or, if
"{{{}iceberg.engine.hive.lock-enabled = false{}}}" use
"{{{}alter_table_with_environmentContext{}}}" and set
{{expected_parameter_key}} / expected_parameter_value to metadata_location /
<previous version>.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)