Noemi Pap-Takacs created IMPALA-12708:
-----------------------------------------

             Summary: An UPDATE creates 2 new snapshots in Iceberg tables
                 Key: IMPALA-12708
                 URL: https://issues.apache.org/jira/browse/IMPALA-12708
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
            Reporter: Noemi Pap-Takacs


UPDATE statement is now supported for Iceberg tables in Impala.

The implementation creates the delete file(s) and the new data file(s) for the 
updated row(s). These files are committed in one Iceberg transaction, but the 
transaction adds two snapshots to the table. The first contains the delete 
file(s), the second adds the new data file(s) of the updated row(s). 

This results in an unusual table history, because the first - temporary - 
snapshot of the transaction will have no time information associated to it (the 
table will spend 0 time in that state), and it will not appear as a separate 
entry when we query table history. Therefore it cannot be queried with time 
travel based on system time. However, it will appear in the history as the 
parent of the current snapshot, and it can be queried based on snapshot id, 
which will give results of an invalid table state.

Impala should create only 1 new snapshot per UPDATE statement, so that the 
parent of the current snapshot points to the previous valid table state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to