RussellSpitzer commented on code in PR #4673:
URL: https://github.com/apache/iceberg/pull/4673#discussion_r862295750
##########
core/src/main/java/org/apache/iceberg/SnapshotProducer.java:
##########
@@ -343,11 +343,15 @@ public void commit() {
LOG.warn("Failed to load committed snapshot, skipping manifest
clean-up");
}
- } catch (RuntimeException e) {
- LOG.warn("Failed to load committed table metadata, skipping manifest
clean-up", e);
+ } catch (Throwable e) {
Review Comment:
@stevenzwu It's not in our code, it's in Spark's code where it doesn't have
a defined exception for either of those states. Apache Spark basically has this
code
```java
try {
DataSource.commit()
} catch (anything) {
datasource.abort()
}
```
[source](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala#L392-L400)
Spark doesn't know about any of our particular exceptions, it just wants to
know if it's call to DataSource.commit failed or not. In this particular
instance the worry is that the Iceberg commit did succeed and no exceptions
were thrown but additional code ran after the Iceberg commit (like our clean up
code or notify listeners) which throws another exception. This exception is
surfaced up as being thrown by DataSource.commit() and causes Spark to call
DataSource.abort().
Spark itself cannot distinguish between an exception that happened before
the actual commit and after the actual commit within our DataSource commit
method. If we wanted to let engines handle this we could have our Iceberg
SparkWrite try to handle this but I think that would probably be difficult to
manage as well. I think it more sense to say that once committed, we suppress
all exceptions. Our Commit method in SnapshotProducer only throws exceptions
when the commit fails and in *no* other circumstances.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]