harshm-dev commented on a change in pull request #3257:
URL: https://github.com/apache/iceberg/pull/3257#discussion_r737138094



##########
File path: core/src/main/java/org/apache/iceberg/TableIdGenerators.java
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.util.Objects;
+
+/**
+ * Contains the state of all id-generators that influence persisted data files 
of a table.
+ *
+ * <p>The contents of this object (and it's JSON representation) must be 
treated opaque, or:
+ * internal to Iceberg.
+ *
+ * <p>For example Nessie uses the returned {@code TableIdGenerators} object to 
maintain differing
+ * {@code TableMetadata}s across different references (branches and tags) but 
maintains the state
+ * of the relevant ID generators across all references. This allows 
cherry-picking and merging
+ * specific changes onto different branches.
+ */
+public final class TableIdGenerators {

Review comment:
       nit: The class isn't really generating any IDs, just holding the state 
instead. Using the suffix "IdGenerators" is confusing.

##########
File path: core/src/main/java/org/apache/iceberg/TableIdGeneratorsParser.java
##########
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.databind.JsonNode;
+import java.io.IOException;
+import java.io.StringWriter;
+import org.apache.iceberg.exceptions.RuntimeIOException;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.util.JsonUtil;
+
+public final class TableIdGeneratorsParser {
+
+  private TableIdGeneratorsParser() {
+  }
+
+  // visible for testing
+  static final String FORMAT_VERSION = TableMetadataParser.FORMAT_VERSION;
+  static final String LAST_SEQUENCE_NUMBER = 
TableMetadataParser.LAST_SEQUENCE_NUMBER;
+  static final String LAST_COLUMN_ID = TableMetadataParser.LAST_COLUMN_ID;
+  static final String LAST_PARTITION_ID = 
TableMetadataParser.LAST_PARTITION_ID;
+
+  public static String toJson(TableIdGenerators idGenerators) {
+    try (StringWriter writer = new StringWriter()) {
+      JsonGenerator generator = JsonUtil.factory().createGenerator(writer);
+      idGeneratorsJson(idGenerators, generator);

Review comment:
       Why not just directly json serialise/de-serialise the object?
   PropertyNames/validations can be achieved via appropriate annotations.
   

##########
File path: nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java
##########
@@ -75,7 +79,7 @@
 public class NessieCatalog extends BaseMetastoreCatalog implements 
AutoCloseable, SupportsNamespaces, Configurable {
   private static final Logger logger = 
LoggerFactory.getLogger(NessieCatalog.class);
   private static final Joiner SLASH = Joiner.on("/");
-  private NessieClient client;
+  private NessieApiV1 api;

Review comment:
       `client` is relatively a better variable name at the consumption. 
`clientApi` could be an option since the class name is also changing.

##########
File path: 
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java
##########
@@ -72,21 +79,37 @@ protected String tableName() {
     return key.toString();
   }
 
+  @Override
+  protected TableMetadata validateRefreshedMetadata(TableMetadata metadata) {
+    // Update the TableMetadata with the contents of TableIdGenerators.
+    String idGenJson = table.getIdGenerators();
+    TableIdGenerators idGenerators = 
TableIdGeneratorsParser.parseJson(idGenJson);
+    return metadata.withIdGenerators(idGenerators);
+  }
+
   @Override
   protected void doRefresh() {
     try {
-      reference.refresh();
+      reference.refresh(api);
     } catch (NessieNotFoundException e) {
       throw new RuntimeException("Failed to refresh as ref is no longer 
valid.", e);
     }
     String metadataLocation = null;
     try {
-      Contents contents = client.getContentsApi().getContents(key, 
reference.getName(), reference.getHash());
-      this.table = contents.unwrap(IcebergTable.class)
-          .orElseThrow(() ->
-              new IllegalStateException("Cannot refresh iceberg table: " +
-                  String.format("Nessie points to a non-Iceberg object for 
path: %s.", key)));
-      metadataLocation = table.getMetadataLocation();
+      Contents contents = 
api.getContents().key(key).reference(reference.getReference()).get()
+          .get(key);
+      LOG.info("Contents '{}' at '{}': {}", key, reference.getReference(), 
contents);

Review comment:
       This should be a debug level log.

##########
File path: core/src/main/java/org/apache/iceberg/TableIdGenerators.java
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.util.Objects;
+
+/**
+ * Contains the state of all id-generators that influence persisted data files 
of a table.
+ *
+ * <p>The contents of this object (and it's JSON representation) must be 
treated opaque, or:
+ * internal to Iceberg.
+ *
+ * <p>For example Nessie uses the returned {@code TableIdGenerators} object to 
maintain differing
+ * {@code TableMetadata}s across different references (branches and tags) but 
maintains the state
+ * of the relevant ID generators across all references. This allows 
cherry-picking and merging
+ * specific changes onto different branches.
+ */
+public final class TableIdGenerators {
+  private final int formatVersion;
+  private final int lastColumnId;
+  private final int lastAssignedPartitionId;
+  private final long lastSequenceNumber;
+
+  private TableIdGenerators(int formatVersion, int lastColumnId, int 
lastAssignedPartitionId,
+      long lastSequenceNumber) {
+    this.formatVersion = formatVersion;
+    this.lastColumnId = lastColumnId;
+    this.lastAssignedPartitionId = lastAssignedPartitionId;
+    this.lastSequenceNumber = lastSequenceNumber;
+  }
+
+  // intentionally package-private - use TableMetadata.getIdGenerators() or 
parse via TableIdGeneratorsParser
+  static TableIdGenerators of(int formatVersion, int lastColumnId, int 
lastAssignedPartitionId,

Review comment:
       Just wondering why constructor cannot be invoked directly.
   What's the intention behind using a factory method?

##########
File path: 
core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java
##########
@@ -184,13 +184,17 @@ protected void refreshFromMetadataLocation(String 
newLocation, Predicate<Excepti
             "Table UUID does not match: current=%s != refreshed=%s", 
currentMetadata.uuid(), newUUID);
       }
 
-      this.currentMetadata = newMetadata.get();
+      this.currentMetadata = validateRefreshedMetadata(newMetadata.get());
       this.currentMetadataLocation = newLocation;
       this.version = parseVersion(newLocation);
     }
     this.shouldRefresh = false;
   }
 
+  protected TableMetadata validateRefreshedMetadata(TableMetadata metadata) {

Review comment:
       From the code below, the method seems to be supplementing metadata with 
global state IDs and returning a mutated object. 
   Naming this method as validateXX doesn't feel correct.

##########
File path: 
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java
##########
@@ -127,14 +155,32 @@ protected void doCommit(TableMetadata base, TableMetadata 
metadata) {
       delete = false;
       throw new CommitStateUnknownException(ex);
     } catch (NessieNotFoundException ex) {
-      throw new RuntimeException(String.format("Commit failed: Reference %s no 
longer exist", reference.getName()), ex);
+      throw new RuntimeException(
+          String.format("Commit failed: Reference %s no longer exist", 
reference.getName()), ex);
     } finally {
       if (delete) {
         io().deleteFile(newMetadataLocation);
       }
     }
   }
 
+  private ImmutableCommitMeta.Builder buildCommitMessage(TableMetadata base, 
TableMetadata metadata) {
+    Builder commitMeta = ImmutableCommitMeta.builder();
+    String commitMessage;
+    Snapshot snapshot = metadata.currentSnapshot();
+    if (snapshot != null && (base == null || base.currentSnapshot() == null ||
+        snapshot.snapshotId() != base.currentSnapshot().snapshotId())) {
+      commitMessage = String.format("Iceberg %s against %s", 
snapshot.operation(), tableName());
+      commitMeta.putProperties("iceberg.operation", snapshot.operation());
+    } else if (base != null && metadata.currentSchemaId() != 
base.currentSchemaId()) {
+      commitMessage = String.format("Iceberg schema change against %s", 
tableName());
+    } else {
+      commitMessage = String.format("Iceberg commit against %s", tableName());
+    }
+
+    return commitMeta.message(commitMessage);

Review comment:
       This is useful, thx!
   
   However, the function is taking multiple responsibilities - initialising a 
builder, setting properties apart from identifying commit message.
   I think we should separate them out, an example version of the consuming 
side of the code -
   ```
   ImmutableCommitMeta.Builder commitMetaBuilder = 
ImmutableCommitMeta.builder();
   addOperationDetails(commitMetaBuilder, base, metadata); // this method
   NessieUtils.addCatalogOptions(commitMetaBuilder, catalogOptions);
   ```

##########
File path: nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java
##########
@@ -135,15 +140,19 @@ public String name() {
 
   @Override
   protected TableOperations newTableOps(TableIdentifier tableIdentifier) {
-    TableReference pti = TableReference.parse(tableIdentifier);
+    TableReference tr = TableReference.parse(tableIdentifier.name());
+    if (tr.hasTimestamp()) {

Review comment:
       Better to use `Precoditions.checkArgument(..)`

##########
File path: 
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java
##########
@@ -103,16 +126,21 @@ protected void doCommit(TableMetadata base, TableMetadata 
metadata) {
 
     boolean delete = true;
     try {
-      ImmutableIcebergTable.Builder newTable = ImmutableIcebergTable.builder();
+      ImmutableIcebergTable.Builder newTableBuilder = 
ImmutableIcebergTable.builder();
       if (table != null) {
-        newTable.from(table);
+        newTableBuilder.id(table.getId());
       }
-      newTable.metadataLocation(newMetadataLocation);
+      IcebergTable newTable = newTableBuilder
+          
.idGenerators(TableIdGeneratorsParser.toJson(metadata.getIdGenerators()))
+          .metadataLocation(newMetadataLocation)
+          .build();
 
-      Operations op = 
ImmutableOperations.builder().addOperations(Operation.Put.of(key, 
newTable.build()))
-          .commitMeta(NessieUtil.buildCommitMetadata("iceberg commit", 
catalogOptions)).build();
-      Branch branch = 
client.getTreeApi().commitMultipleOperations(reference.getAsBranch().getName(),
-          reference.getHash(), op);
+      LOG.info("Committing '{}' against '{}': {}", key, 
reference.getReference(), newTable);

Review comment:
       debug level please




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to