guykhazma commented on code in PR #12228:
URL: https://github.com/apache/iceberg/pull/12228#discussion_r2036041469


##########
core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java:
##########
@@ -71,23 +70,35 @@ public Table loadTable(TableIdentifier identifier) {
   }
 
   @Override
-  public Table registerTable(TableIdentifier identifier, String 
metadataFileLocation) {
+  public Table registerTable(
+      TableIdentifier identifier, String metadataFileLocation, boolean 
overwrite) {
     Preconditions.checkArgument(
         identifier != null && isValidIdentifier(identifier), "Invalid 
identifier: %s", identifier);
     Preconditions.checkArgument(
         metadataFileLocation != null && !metadataFileLocation.isEmpty(),
         "Cannot register an empty metadata file location as a table");
 
-    // Throw an exception if this table already exists in the catalog.
-    if (tableExists(identifier)) {
+    // If the table already exists and overwriting is disabled, throw an 
exception.
+    if (tableExists(identifier) && !overwrite) {
       throw new AlreadyExistsException("Table already exists: %s", identifier);
     }
 
     TableOperations ops = newTableOps(identifier);
-    InputFile metadataFile = ops.io().newInputFile(metadataFileLocation);
-    TableMetadata metadata = TableMetadataParser.read(ops.io(), metadataFile);
-    ops.commit(null, metadata);
-
+    TableMetadata newMetadata =
+        TableMetadataParser.read(ops.io(), 
ops.io().newInputFile(metadataFileLocation));
+
+    TableMetadata existing = ops.current();
+    if (existing != null && overwrite) {
+      if (existing.metadataFileLocation().equals(metadataFileLocation)) {
+        LOG.info(
+            "The requested metadata matches the existing metadata. No changes 
will be committed.");
+        return new BaseTable(ops, fullTableName(name(), identifier), 
metricsReporter());
+      }
+      dropTable(identifier, false /* Keep all data and metadata files */);

Review Comment:
   I also agree that atomic swap is the behaviour I would expect as leaving an 
intermediate stage is problematic.
   I have some questions/suggestions:
   1. What are the semantics expected from the REST API perspective? does it 
expect the operation to succeed atomically? I haven't found information in the 
documentation about that.
   2. If the semantics are not something which are expected by the REST API 
then shouldn't this be left to the specific implementation of the catalog and 
then some will be able provide the atomic swap?
   3. I would also like to bring for discussion an alternative approach - 
instead of dropping and then re-registering we can enable registering multiple 
location and have the reader access them by their commit order.
   This way we can first do the register with the new metadatafile and only 
then do the drop of the old metadata file (it will technically not be a drop 
anymore but more like a cleanup).
   
   curious to hear your thoughts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to