dramaticlly commented on code in PR #12228:
URL: https://github.com/apache/iceberg/pull/12228#discussion_r1964589167


##########
core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java:
##########
@@ -71,23 +71,23 @@ public Table loadTable(TableIdentifier identifier) {
   }
 
   @Override
-  public Table registerTable(TableIdentifier identifier, String 
metadataFileLocation) {
+  public Table registerTable(
+      TableIdentifier identifier, String metadataFileLocation, boolean 
overwrite) {
     Preconditions.checkArgument(
         identifier != null && isValidIdentifier(identifier), "Invalid 
identifier: %s", identifier);
     Preconditions.checkArgument(
         metadataFileLocation != null && !metadataFileLocation.isEmpty(),
         "Cannot register an empty metadata file location as a table");
 
     // Throw an exception if this table already exists in the catalog.
-    if (tableExists(identifier)) {
+    if (tableExists(identifier) && !overwrite) {
       throw new AlreadyExistsException("Table already exists: %s", identifier);
     }
 
     TableOperations ops = newTableOps(identifier);
     InputFile metadataFile = ops.io().newInputFile(metadataFileLocation);
-    TableMetadata metadata = TableMetadataParser.read(ops.io(), metadataFile);
-    ops.commit(null, metadata);
-
+    TableMetadata currentMetadata = tableExists(identifier) ? ops.current() : 
null;
+    ops.commit(currentMetadata, TableMetadataParser.read(ops.io(), 
metadataFile));

Review Comment:
   I was hoping to reuse the existing commit logic for atomicity support, and 
also better lineage to track previous/old metadata for hive table and JDBC 
tables. 
   
   As for potential conflict when base is out of date, I think that's a valid 
concern and we probably do not want this operation fail. I am thinking about 
add a retry block to help, please let me know if you feel otherwise
   
   ```java
   AtomicBoolean isRetry = new AtomicBoolean(false);
   // commit with retry
   Tasks.foreach(ops)
     .retry(COMMIT_NUM_RETRIES_DEFAULT)
     .exponentialBackoff(
         COMMIT_MIN_RETRY_WAIT_MS_DEFAULT,
         COMMIT_MAX_RETRY_WAIT_MS_DEFAULT,
         COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT,
         2.0 /* exponential */)
     .onlyRetryOn(CommitFailedException.class)
     .run(
         taskOps -> {
           TableMetadata base = isRetry.get() ? taskOps.refresh() : 
taskOps.current();
           isRetry.set(true);
           taskOps.commit(base, newMetadata);
         });
   ```



##########
core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java:
##########
@@ -2871,6 +2872,33 @@ public void testRegisterExistingTable() {
     assertThat(catalog.dropTable(identifier)).isTrue();
   }
 
+  @Test
+  public void testRegisterAndOverwriteExistingTable() {
+    C catalog = catalog();

Review Comment:
   Initially I think register with overwrite helps revert an existing table to 
a new previous health state. If we want to support overwrite with another 
tables's metadata, It seems better suited with drop + register, to reflect the 
table UUID change. 
   
   From the [table 
spec](https://iceberg.apache.org/spec/#table-metadata-fields), it asks 
Implementations to throw an exception if a table's UUID does not match the 
expected UUID when refreshing metadata. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to