rdblue commented on a change in pull request #2465:
URL: https://github.com/apache/iceberg/pull/2465#discussion_r620700102
##########
File path: core/src/main/java/org/apache/iceberg/SchemaUpdate.java
##########
@@ -408,11 +428,50 @@ private TableMetadata applyChangesToMapping(TableMetadata
metadata) {
private static Schema applyChanges(Schema schema, List<Integer> deletes,
Map<Integer, Types.NestedField> updates,
Multimap<Integer, Types.NestedField> adds,
- Multimap<Integer, Move> moves) {
+ Multimap<Integer, Move> moves,
+ Set<String> identifierNames) {
+ // validate existing identifier fields are not deleted
+ for (String name : identifierNames) {
+ Types.NestedField field = schema.findField(name);
+ if (field != null) {
+ Preconditions.checkArgument(!deletes.contains(field.fieldId()),
+ "Cannot delete identifier field %s. To force deletion, " +
+ "also call setIdentifierFields to update identifier fields.",
field);
+ }
+ }
+
+ // apply schema changes
Types.StructType struct = TypeUtil
.visit(schema, new ApplyChanges(deletes, updates, adds, moves))
.asNestedType().asStructType();
- return new Schema(struct.fields());
+
+ // validate identifier requirements based on latest schema
+ Schema noIdentifierSchema = new Schema(struct.fields());
+ Set<Integer> validatedIdentifiers = identifierNames.stream()
+ .map(n -> validateIdentifierField(n, noIdentifierSchema))
+ .collect(Collectors.toSet());
+
+ return new Schema(struct.fields(), validatedIdentifiers);
+ }
+
+ private static int validateIdentifierField(String name, Schema schema) {
Review comment:
I don't think that this method is quite right. Each time it is called,
it will index the parents in the schema that is passed in, and that schema was
only created so that `findFIeld` would work. It also mixes validation with
returning the field IDs of each identifier field.
I think it would be cleaner to create the parent and name index just once.
And there is no need to create a schema just to index the names. It would also
be better to create the list of identifier IDs, then validate the ID list
separately.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]