rdsr commented on a change in pull request #2328:
URL: https://github.com/apache/iceberg/pull/2328#discussion_r593566936



##########
File path: 
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -217,11 +239,40 @@ protected void doCommit(TableMetadata base, TableMetadata 
metadata) {
       throw new RuntimeException("Interrupted during commit", e);
 
     } finally {
-      cleanupMetadataAndUnlock(threw, newMetadataLocation, lockId);
+      cleanupMetadataAndUnlock(commitStatus, newMetadataLocation, lockId);
+    }
+  }
+
+  /**
+   * Attempt to load the table and see if any current or past metadata 
location matches the one we were attempting
+   * to set. This is used as a last resort when we are dealing with exceptions 
that may indicate the commit has
+   * failed but are not proof that this is the case. Past locations must also 
be searched on the chance that a second
+   * committer was able to successfully commit on top of our commit.

Review comment:
       good call!

##########
File path: 
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -198,8 +206,22 @@ protected void doCommit(TableMetadata base, TableMetadata 
metadata) {
 
       setHmsTableParameters(newMetadataLocation, tbl, metadata.properties(), 
removedProps, hiveEngineEnabled);
 
-      persistTable(tbl, updateHiveTable);
-      threw = false;
+      try {
+        persistTable(tbl, updateHiveTable);
+        commitStatus = CommitStatus.SUCCESS;
+      } catch (Throwable persistFailure) {
+        LOG.error("Cannot tell if commit succeeded, attempting to reconnect 
and check", persistFailure);
+        commitStatus = checkCommitStatus(newMetadataLocation);

Review comment:
       There's retries built into the `HiveClientPool` ,  thoughts on why an 
additional retry helps, @aokolnychyi , @RussellSpitzer ?

##########
File path: 
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -198,8 +206,22 @@ protected void doCommit(TableMetadata base, TableMetadata 
metadata) {
 
       setHmsTableParameters(newMetadataLocation, tbl, metadata.properties(), 
removedProps, hiveEngineEnabled);
 
-      persistTable(tbl, updateHiveTable);
-      threw = false;
+      try {
+        persistTable(tbl, updateHiveTable);
+        commitStatus = CommitStatus.SUCCESS;
+      } catch (Throwable persistFailure) {

Review comment:
       When this exception occurred in prod, what sort of exception did you 
see? 
   I would have imagine any `TException` or `MetaException` implies that we do 
get some error response from the Metastore, a network partition e.g would have 
some sort of socket exception (e.g socket closed), as the HiveClientPool would 
try reconnecting 3 times




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to