rdblue commented on a change in pull request #2328:
URL: https://github.com/apache/iceberg/pull/2328#discussion_r594597450
##########
File path: core/src/main/java/org/apache/iceberg/SnapshotProducer.java
##########
@@ -293,8 +293,15 @@ public void commit() {
taskOps.commit(base, updated.withUUID());
});
+ } catch (CommitFailedException commitFailedException) {
+ // We have an acknowledged failure from the Catalog. We are confident
that the commit has not been applied
+ Exceptions.suppressAndThrow(commitFailedException, this::cleanAll);
} catch (RuntimeException e) {
- Exceptions.suppressAndThrow(e, this::cleanAll);
+ LOG.error("Cannot determine whether the commit was successful or not,
the underlying data files may or " +
Review comment:
It's very dangerous to change the logic here. If there is an exception
where it is unknown whether or not the table reference was updated, then it is
safer to drop the files and assume that the commit was not successful. That's
because we still need to propagate the exception, which signals that the commit
failed. Most of the time, that will cause a higher-level retry of the whole
operation, like re-running a scheduled job with the same data. If that happens
and both commits are actually successful, then we have a situation where
incoming records were silently duplicated.
I like the idea of throwing `UnknownCommitStateException`, but I think we
would need some way to ensure that it is actually handled. An idempotent
committer (like the Flink sink) could simply swallow the exception and retry.
But without coordination between the `TableOperations` that decides whether to
throw this exception and the committer I think there is still a problem. Could
we add an option to `SnapshotProducer` to signal whether it would be safe and
then skip cleanup if the flag is set?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]