rdblue commented on a change in pull request #2465:
URL: https://github.com/apache/iceberg/pull/2465#discussion_r620700102



##########
File path: core/src/main/java/org/apache/iceberg/SchemaUpdate.java
##########
@@ -408,11 +428,50 @@ private TableMetadata applyChangesToMapping(TableMetadata 
metadata) {
   private static Schema applyChanges(Schema schema, List<Integer> deletes,
                                      Map<Integer, Types.NestedField> updates,
                                      Multimap<Integer, Types.NestedField> adds,
-                                     Multimap<Integer, Move> moves) {
+                                     Multimap<Integer, Move> moves,
+                                     Set<String> identifierNames) {
+    // validate existing identifier fields are not deleted
+    for (String name : identifierNames) {
+      Types.NestedField field = schema.findField(name);
+      if (field != null) {
+        Preconditions.checkArgument(!deletes.contains(field.fieldId()),
+            "Cannot delete identifier field %s. To force deletion, " +
+                "also call setIdentifierFields to update identifier fields.", 
field);
+      }
+    }
+
+    // apply schema changes
     Types.StructType struct = TypeUtil
         .visit(schema, new ApplyChanges(deletes, updates, adds, moves))
         .asNestedType().asStructType();
-    return new Schema(struct.fields());
+
+    // validate identifier requirements based on latest schema
+    Schema noIdentifierSchema = new Schema(struct.fields());
+    Set<Integer> validatedIdentifiers = identifierNames.stream()
+        .map(n -> validateIdentifierField(n, noIdentifierSchema))
+        .collect(Collectors.toSet());
+
+    return new Schema(struct.fields(), validatedIdentifiers);
+  }
+
+  private static int validateIdentifierField(String name, Schema schema) {

Review comment:
       I don't think that this method is quite right. Each time it is called, 
it will index the parents in the schema that is passed in, and that schema was 
only created so that `findFIeld` would work. It also mixes validation with 
returning the field IDs of each identifier field.
   
   I think it would be cleaner to create the parent and name index just once. 
And there is no need to create a schema just to index the names. It would also 
be better to create the list of identifier IDs, then validate the ID list 
separately.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to