dramaticlly commented on code in PR #12228: URL: https://github.com/apache/iceberg/pull/12228#discussion_r1964589167
########## core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java: ########## @@ -71,23 +71,23 @@ public Table loadTable(TableIdentifier identifier) { } @Override - public Table registerTable(TableIdentifier identifier, String metadataFileLocation) { + public Table registerTable( + TableIdentifier identifier, String metadataFileLocation, boolean overwrite) { Preconditions.checkArgument( identifier != null && isValidIdentifier(identifier), "Invalid identifier: %s", identifier); Preconditions.checkArgument( metadataFileLocation != null && !metadataFileLocation.isEmpty(), "Cannot register an empty metadata file location as a table"); // Throw an exception if this table already exists in the catalog. - if (tableExists(identifier)) { + if (tableExists(identifier) && !overwrite) { throw new AlreadyExistsException("Table already exists: %s", identifier); } TableOperations ops = newTableOps(identifier); InputFile metadataFile = ops.io().newInputFile(metadataFileLocation); - TableMetadata metadata = TableMetadataParser.read(ops.io(), metadataFile); - ops.commit(null, metadata); - + TableMetadata currentMetadata = tableExists(identifier) ? ops.current() : null; + ops.commit(currentMetadata, TableMetadataParser.read(ops.io(), metadataFile)); Review Comment: I was hoping to reuse the existing commit logic for atomicity support, and also better lineage to track previous/old metadata for hive table and JDBC tables. As for potential conflict when base is out of date, I think that's a valid concern and we probably do not want this operation fail. I am thinking about add a retry block to help, please let me know if you feel otherwise ```java AtomicBoolean isRetry = new AtomicBoolean(false); // commit with retry Tasks.foreach(ops) .retry(COMMIT_NUM_RETRIES_DEFAULT) .exponentialBackoff( COMMIT_MIN_RETRY_WAIT_MS_DEFAULT, COMMIT_MAX_RETRY_WAIT_MS_DEFAULT, COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT, 2.0 /* exponential */) .onlyRetryOn(CommitFailedException.class) .run( taskOps -> { TableMetadata base = isRetry.get() ? taskOps.refresh() : taskOps.current(); isRetry.set(true); taskOps.commit(base, newMetadata); }); ``` ########## core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java: ########## @@ -2871,6 +2872,33 @@ public void testRegisterExistingTable() { assertThat(catalog.dropTable(identifier)).isTrue(); } + @Test + public void testRegisterAndOverwriteExistingTable() { + C catalog = catalog(); Review Comment: Initially I think register with overwrite helps revert an existing table to a new previous health state. If we want to support overwrite with another tables's metadata, It seems better suited with drop + register, to reflect the table UUID change. From the [table spec](https://iceberg.apache.org/spec/#table-metadata-fields), it asks Implementations to throw an exception if a table's UUID does not match the expected UUID when refreshing metadata. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org