Kontinuation commented on issue #5752:
URL: https://github.com/apache/iceberg/issues/5752#issuecomment-1244864514

   This seems to be an issue of [trino-iceberg 
plugin](https://github.com/trinodb/trino/blob/385/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/hms/HiveMetastoreTableOperations.java#L92-L98),
 which throws `CommitFailedException` on an inappropriate occasion.
   
   Commit of transactions takes roughly 2 steps:
   
   1. Write metadata of the new snapshot to file storage;
   2. Update the catalog to point to the location of the new metadata. This is 
the step where `CommitFailedException` could be thrown after the snapshot is 
persisted to file storage.
   
   This issue is about whether we can roll back step 1 when step 2 fails. Step 
2 can fail in many ways, in some cases we are pretty sure that the update must 
have failed, and the metadata location of the table should remain unchanged 
after failure. Example of such failures are `CommitFailedException` and 
`org.apache.hadoop.hive.metastore.api.AlreadyExistsException`. In this case we 
can safely roll back step 1.
   
   For most types of failures, we don't know if the catalog was actually 
updated or not, such as the `java.net.SocketTimeoutException` mentioned in this 
issue. We should not blindly rollback step 1 in this case. The hive metastore 
catalog implementation in iceberg [does a second 
check](https://github.com/apache/iceberg/blob/0.14.x/core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java#L278-L288)
 when unexpected exception was raised during metastore update.
   
   To my mere understanding, `CommitFailedException` in iceberg indicates a 
known commit failure, where the catalog was guaranteed to be left unmodified 
(thrown in metadata consistency validation process most of the time). It should 
not be thrown in case of socket errors when updating hive metadata.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to