aokolnychyi commented on a change in pull request #2328:
URL: https://github.com/apache/iceberg/pull/2328#discussion_r595461100
##########
File path: core/src/main/java/org/apache/iceberg/SnapshotProducer.java
##########
@@ -293,8 +293,15 @@ public void commit() {
taskOps.commit(base, updated.withUUID());
});
+ } catch (CommitFailedException commitFailedException) {
+ // We have an acknowledged failure from the Catalog. We are confident
that the commit has not been applied
+ Exceptions.suppressAndThrow(commitFailedException, this::cleanAll);
} catch (RuntimeException e) {
- Exceptions.suppressAndThrow(e, this::cleanAll);
+ LOG.error("Cannot determine whether the commit was successful or not,
the underlying data files may or " +
Review comment:
I thought more about this today and I am still convinced deleting files
in an attempt to fail other concurrent operations (which is not guaranteed) and
requiring a manual intervention to restore the table is NOT something we should
do due to the described consequences and cases that we won't handle.
I also like throwing an exception rather than hanging forever as this seems
to be more descriptive. I think the error message Russell throws right now is
pretty elaborate and it should be propagated as a cause of `SparkException`. As
a user, I'd prefer my job to fail with `UnknownCommitStateException` rather
than with a job timeout exception and no signs of what went wrong.
Internally, we offer job retry mechanisms but we explicitly warn our users
that they may enable it only if their jobs are idempotent. We can try to reduce
the chances of retries but I don't think we can solve that completely when we
loose the metastore connectivity. I think it is reasonable to explore how we
can skip already committed batch operations like we do in Structured Streaming
but that can be done in a follow-up PR.
It may be also a good call to check the commit state more than once for some
reasonable time frame but it shouldn't be for too long to avoid job timeouts.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]