rdblue commented on code in PR #14947:
URL: https://github.com/apache/iceberg/pull/14947#discussion_r2687531324
##########
core/src/main/java/org/apache/iceberg/SnapshotProducer.java:
##########
@@ -465,6 +464,10 @@ public void commit() {
// to ensure that if a concurrent operation assigns the
UUID, this operation will
// not fail.
taskOps.commit(base, updated.withUUID());
+
+ // Set the staged snapshot AFTER successful commit to ensure
cleanup uses
+ // the actually committed snapshot, not one from a
failed/no-op attempt
+ stagedSnapshot.set(newSnapshot);
Review Comment:
I'm -1 on this change.
First, there should be no code following the `TableOperations.commit` call.
Anything after the `commit` call creates a risk that the result of a successful
commit is hidden by an error in the later code. This caused the original issues
where commit states are unknown.
Second, if there is a problem with the staged snapshot not matching what was
committed, then we should roll back the changes from #10523 that changed from
looking up the snapshot in committed metadata by ID to keeping a reference to
the snapshot being committed. That's a better fix than moving the
`stagedSnapshot.set` call.
However, having the wrong snapshot reference is not the root cause. The
problem is a bad retry caused by mishandling errors in the table operations
code. According to the issue, #14583, `HiveTableOperations` throws a
`CommitFailedException`, which signals to this code that the commit did not
occur and should be retried. That's not correct because the commit did happen.
A retry here results in noop detection but the commit was not a noop.
The problem is in the recovery code in `HiveTableOperations`, not here. That
said, I think it would be safer to revert the change from #10523 so that the
committed snapshot is fetched from the actual table state.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]