[
https://issues.apache.org/jira/browse/IMPALA-12708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808373#comment-17808373
]
ASF subversion and git services commented on IMPALA-12708:
----------------------------------------------------------
Commit b372f87b620b9d240059fb6e098f62685c14e15e in impala's branch
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b372f87b6 ]
IMPALA-12708: An UPDATE creates 2 new snapshots in Iceberg tables
The current implementation of UPDATE creates the delete file(s) and the
new data file(s) for the updated row(s). These files are committed in
one Iceberg transaction, but the transaction adds two snapshots to the
table. The first contains the delete file(s), the second adds the new
data file(s) of the updated row(s). Only the final snapshot (which
holds the consistent table state) is observable by concurrent readers,
but still, the commit history can look strange with these "phantom
snapshots".
So instead of doing a RowDelta and AppendFiles operation in a single
transaction, with this change we are doing a single RowDelta operation
only.
Another issue was that we also committed empty operations (e.g. UPDATEs
with zero records). These created redundant snapshots in the table
history. This patch also fixes that.
Testing:
* added e2e test that checks the table history
Change-Id: I2ceb80b939c644388707b21061bf55451234dcd3
Reviewed-on: http://gerrit.cloudera.org:8080/20903
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Zoltan Borok-Nagy <[email protected]>
> An UPDATE creates 2 new snapshots in Iceberg tables
> ---------------------------------------------------
>
> Key: IMPALA-12708
> URL: https://issues.apache.org/jira/browse/IMPALA-12708
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Noemi Pap-Takacs
> Assignee: Zoltán Borók-Nagy
> Priority: Major
> Labels: iceberg, impala-iceberg
>
> UPDATE statement is now supported for Iceberg tables in Impala.
> The implementation creates the delete file(s) and the new data file(s) for
> the updated row(s). These files are committed in one Iceberg transaction, but
> the transaction adds two snapshots to the table. The first contains the
> delete file(s), the second adds the new data file(s) of the updated row(s).
> This results in an unusual table history, because the first - temporary -
> snapshot of the transaction will have no time information associated to it
> (the table will spend 0 time in that state), and it will not appear as a
> separate entry when we query table history. Therefore it cannot be queried
> with time travel based on system time. However, it will appear in the history
> as the parent of the current snapshot, and it can be queried based on
> snapshot id, which will give results of an invalid table state.
> Impala should create only 1 new snapshot per UPDATE statement, so that the
> parent of the current snapshot points to the previous valid table state.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]