rdblue commented on a change in pull request #2328:
URL: https://github.com/apache/iceberg/pull/2328#discussion_r594597450



##########
File path: core/src/main/java/org/apache/iceberg/SnapshotProducer.java
##########
@@ -293,8 +293,15 @@ public void commit() {
             taskOps.commit(base, updated.withUUID());
           });
 
+    } catch (CommitFailedException commitFailedException) {
+      // We have an acknowledged failure from the Catalog. We are confident 
that the commit has not been applied
+      Exceptions.suppressAndThrow(commitFailedException, this::cleanAll);
     } catch (RuntimeException e) {
-      Exceptions.suppressAndThrow(e, this::cleanAll);
+      LOG.error("Cannot determine whether the commit was successful or not, 
the underlying data files may or " +

Review comment:
       It's very dangerous to change the logic here. If there is an exception 
where it is unknown whether or not the table reference was updated, then it is 
safer to drop the files and assume that the commit was not successful. That's 
because we still need to propagate the exception, which signals that the commit 
failed. Most of the time, that will cause a higher-level retry of the whole 
operation, like re-running a scheduled job with the same data. If that happens 
and both commits are actually successful, then we have a situation where 
incoming records were silently duplicated.
   
   I like the idea of throwing `UnknownCommitStateException`, but I think we 
would need some way to ensure that it is actually handled. An idempotent 
committer (like the Flink sink) could simply swallow the exception and retry. 
But without coordination between the `TableOperations` that decides whether to 
throw this exception and the committer I think there is still a problem. Could 
we add an option to `SnapshotProducer` to signal whether it would be safe and 
then skip cleanup if the flag is set?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to