Noemi Pap-Takacs created IMPALA-12708:
-----------------------------------------
Summary: An UPDATE creates 2 new snapshots in Iceberg tables
Key: IMPALA-12708
URL: https://issues.apache.org/jira/browse/IMPALA-12708
Project: IMPALA
Issue Type: Bug
Components: Catalog
Reporter: Noemi Pap-Takacs
UPDATE statement is now supported for Iceberg tables in Impala.
The implementation creates the delete file(s) and the new data file(s) for the
updated row(s). These files are committed in one Iceberg transaction, but the
transaction adds two snapshots to the table. The first contains the delete
file(s), the second adds the new data file(s) of the updated row(s).
This results in an unusual table history, because the first - temporary -
snapshot of the transaction will have no time information associated to it (the
table will spend 0 time in that state), and it will not appear as a separate
entry when we query table history. Therefore it cannot be queried with time
travel based on system time. However, it will appear in the history as the
parent of the current snapshot, and it can be queried based on snapshot id,
which will give results of an invalid table state.
Impala should create only 1 new snapshot per UPDATE statement, so that the
parent of the current snapshot points to the previous valid table state.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]