aokolnychyi commented on a change in pull request #2328:
URL: https://github.com/apache/iceberg/pull/2328#discussion_r593358075



##########
File path: 
api/src/main/java/org/apache/iceberg/exceptions/CommitStateUnknownException.java
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.exceptions;
+
+/**
+ * Exception for a failure to confirm either affirmatively or negatively that 
a commit was applied. The client
+ * cannot take any further action without possibly corrupting the table.
+ */
+public class CommitStateUnknownException extends RuntimeException {
+
+  private static final String commonInfo =

Review comment:
       nit: `commonInfo` -> `COMMON_INFO`

##########
File path: 
api/src/main/java/org/apache/iceberg/exceptions/CommitStateUnknownException.java
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.exceptions;
+
+/**
+ * Exception for a failure to confirm either affirmatively or negatively that 
a commit was applied. The client
+ * cannot take any further action without possibly corrupting the table.
+ */
+public class CommitStateUnknownException extends RuntimeException {
+
+  private static final String commonInfo =
+      "\nCannot determine whether the commit was successful or not, the 
underlying data files may or " +

Review comment:
       nit: shall we add "\n" directly below to keep the constant simple?
   
   ```
   cause.getMessage() + "\n" + COMMON_INFO
   ```
   
   

##########
File path: 
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -199,29 +203,68 @@ protected void doCommit(TableMetadata base, TableMetadata 
metadata) {
       setHmsTableParameters(newMetadataLocation, tbl, metadata.properties(), 
removedProps, hiveEngineEnabled);
 
       persistTable(tbl, updateHiveTable);
-      threw = false;
     } catch (org.apache.hadoop.hive.metastore.api.AlreadyExistsException e) {
+      canCleanupMetadata = true;
       throw new AlreadyExistsException("Table already exists: %s.%s", 
database, tableName);
 
     } catch (TException | UnknownHostException e) {
       if (e.getMessage() != null && e.getMessage().contains("Table/View 
'HIVE_LOCKS' does not exist")) {
+        canCleanupMetadata = true;
         throw new RuntimeException("Failed to acquire locks from metastore 
because 'HIVE_LOCKS' doesn't " +
             "exist, this probably happened when using embedded metastore or 
doesn't create a " +
-            "transactional meta table. To fix this, use an alternative 
metastore", e);
+            "transactional meta table. To fix this, use an alternative 
metastore.\n%s", e);
       }
 
-      throw new RuntimeException(String.format("Metastore operation failed for 
%s.%s", database, tableName), e);
+      RuntimeException metastoreException =
+              new RuntimeException(String.format("Metastore operation failed 
for %s.%s", database, tableName), e);
+      if (checkCommitSuccessful(newMetadataLocation, metastoreException)) {
+        return; // We are able to verify the commit succeed
+      } else {
+        // We were able to check and the commit did not succeed
+        canCleanupMetadata = true;
+        throw metastoreException;
+      }
 
     } catch (InterruptedException e) {
-      Thread.currentThread().interrupt();
-      throw new RuntimeException("Interrupted during commit", e);
 
+      Thread.currentThread().interrupt();
+      RuntimeException interruptException = new RuntimeException("Interrupted 
during commit", e);
+      if (checkCommitSuccessful(newMetadataLocation, interruptException)) {
+        return; // We are able to verify the commit succeed
+      } else {
+        // We were able to check and the commit did not succeed
+        canCleanupMetadata = true;
+        throw interruptException;
+      }
     } finally {
-      cleanupMetadataAndUnlock(threw, newMetadataLocation, lockId);
+      cleanupMetadataAndUnlock(canCleanupMetadata, newMetadataLocation, 
lockId);

Review comment:
       @pvary @marton-bod, we only handle `TException` and 
`UnknownHostException` here. Is there any chance we get another exception while 
we do `alter_table`?

##########
File path: 
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -199,29 +203,68 @@ protected void doCommit(TableMetadata base, TableMetadata 
metadata) {
       setHmsTableParameters(newMetadataLocation, tbl, metadata.properties(), 
removedProps, hiveEngineEnabled);
 
       persistTable(tbl, updateHiveTable);
-      threw = false;
     } catch (org.apache.hadoop.hive.metastore.api.AlreadyExistsException e) {
+      canCleanupMetadata = true;
       throw new AlreadyExistsException("Table already exists: %s.%s", 
database, tableName);
 
     } catch (TException | UnknownHostException e) {

Review comment:
       I wonder whether this place is too generic. Apart from handling 
exceptions in alter_table (i.e. the actual commit), we also catch any 
exceptions during loading and locking. We don't necessarily have to do these 
checks in cases where we did not attempt to commit a new version.
   
   Is it a crazy idea to only add this check in `persistTable`, the one that 
does the actual commit?

##########
File path: 
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -199,29 +203,68 @@ protected void doCommit(TableMetadata base, TableMetadata 
metadata) {
       setHmsTableParameters(newMetadataLocation, tbl, metadata.properties(), 
removedProps, hiveEngineEnabled);
 
       persistTable(tbl, updateHiveTable);
-      threw = false;
     } catch (org.apache.hadoop.hive.metastore.api.AlreadyExistsException e) {
+      canCleanupMetadata = true;
       throw new AlreadyExistsException("Table already exists: %s.%s", 
database, tableName);
 
     } catch (TException | UnknownHostException e) {
       if (e.getMessage() != null && e.getMessage().contains("Table/View 
'HIVE_LOCKS' does not exist")) {
+        canCleanupMetadata = true;
         throw new RuntimeException("Failed to acquire locks from metastore 
because 'HIVE_LOCKS' doesn't " +
             "exist, this probably happened when using embedded metastore or 
doesn't create a " +
-            "transactional meta table. To fix this, use an alternative 
metastore", e);
+            "transactional meta table. To fix this, use an alternative 
metastore.\n%s", e);
       }
 
-      throw new RuntimeException(String.format("Metastore operation failed for 
%s.%s", database, tableName), e);
+      RuntimeException metastoreException =
+              new RuntimeException(String.format("Metastore operation failed 
for %s.%s", database, tableName), e);
+      if (checkCommitSuccessful(newMetadataLocation, metastoreException)) {
+        return; // We are able to verify the commit succeed
+      } else {
+        // We were able to check and the commit did not succeed
+        canCleanupMetadata = true;
+        throw metastoreException;
+      }
 
     } catch (InterruptedException e) {
-      Thread.currentThread().interrupt();
-      throw new RuntimeException("Interrupted during commit", e);
 
+      Thread.currentThread().interrupt();
+      RuntimeException interruptException = new RuntimeException("Interrupted 
during commit", e);
+      if (checkCommitSuccessful(newMetadataLocation, interruptException)) {
+        return; // We are able to verify the commit succeed
+      } else {
+        // We were able to check and the commit did not succeed
+        canCleanupMetadata = true;
+        throw interruptException;
+      }
     } finally {
-      cleanupMetadataAndUnlock(threw, newMetadataLocation, lockId);
+      cleanupMetadataAndUnlock(canCleanupMetadata, newMetadataLocation, 
lockId);
+    }
+  }
+
+  /**
+   * Attempt to load the table and see if the current metadata location 
matches our new commit path. This as used as
+   * a last resort when we are dealing with exceptions which may indicate that 
our commit has failed but we are not
+   * certain if that is the case.
+   * @param newMetadataLocation the path of the new commit file
+   * @param originalFailure the exception which leads us to believe the commit 
has failed
+   * @return true if the commit was successful, false if not, and rethrows the 
original exception if we cannot
+   * determine
+   */
+  private boolean checkCommitSuccessful(String newMetadataLocation, 
RuntimeException originalFailure) {
+    try {
+      refresh();

Review comment:
       I think `refresh` already returns `TableMetadata` to us.

##########
File path: 
api/src/main/java/org/apache/iceberg/exceptions/CommitStateUnknownException.java
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.exceptions;
+
+/**
+ * Exception for a failure to confirm either affirmatively or negatively that 
a commit was applied. The client
+ * cannot take any further action without possibly corrupting the table.
+ */
+public class CommitStateUnknownException extends RuntimeException {
+
+  private static final String commonInfo =
+      "\nCannot determine whether the commit was successful or not, the 
underlying data files may or " +
+      "may not be needed. Manual intervention via the Remove Orphan Files 
Action can remove these " +
+      "files when a connection to the Catalog can be re-established if the 
commit was actually unsuccessful." +
+      "Please check to see whether or not your commit was successful when the 
catalog is again reachable." +

Review comment:
       nit: I'd add something `... was successful before retrying the operation 
when the catalog is again reachable. Retrying an operation without checking may 
lead to duplicated data. At this time ...`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to