Saulius Valatka created IMPALA-13499:
----------------------------------------

             Summary: REFRESH on Iceberg tables can lead to data loss
                 Key: IMPALA-13499
                 URL: https://issues.apache.org/jira/browse/IMPALA-13499
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
    Affects Versions: Impala 4.4.1
            Reporter: Saulius Valatka


When running a REFRESH statement on an Iceberg table the catalog loads it from 
the Hive metastore and later performs an {{alter_table}} 
[here|https://github.com/apache/impala/blob/bdce7778b239f6fbf8ea89ea32b91a83c8017828/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L445].
 It does so without taking a Hive lock, meaning that if any external process 
commits to the table between load and alter, the newly committed 
"metadata_location" property will be overwritten with the previous value and 
effectively will result in data loss.

It should either take a Hive lock when doing this, or, if 
"{{{}iceberg.engine.hive.lock-enabled = false{}}}" use 
"{{{}alter_table_with_environmentContext{}}}" and set 
{{expected_parameter_key}} / expected_parameter_value to metadata_location / 
<previous version>.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to