rdblue commented on a change in pull request #2465:
URL: https://github.com/apache/iceberg/pull/2465#discussion_r615313057
##########
File path: core/src/main/java/org/apache/iceberg/SchemaUpdate.java
##########
@@ -317,6 +333,38 @@ public UpdateSchema unionByNameWith(Schema newSchema) {
return this;
}
+ @Override
+ public UpdateSchema addIdentifierField(String name) {
+ Types.NestedField field = schema.findField(name);
+ if (field == null) {
+ field = adds.get(TABLE_ROOT_ID).stream()
+ .filter(f -> f.name().equals(name))
+ .findAny().orElse(null);
+ }
+
+ Preconditions.checkArgument(field != null,
+ "Cannot find column of the given name in existing or newly added
columns: %s", name);
+ Preconditions.checkArgument(field.isRequired(),
+ "Cannot add column %s as an identifier field because it is not a
required field", name);
Review comment:
I don't think that this requirement is correct. In the case where we're
adding a column and then making it part of the identifier, values for all of
the existing data will be null. I don't see a reason why we can't consider, for
example, `(1, null)` to be an identifier.
The strongest argument for making this a required field is SQL's boolean
logic, where `null` is not equal to itself. But in Iceberg, we translate those
expressions so that we can use normal boolean logic throughout the library.
Equality deletes consider null equal to itself in Iceberg, just like set
operations in SQL use null-safe equality.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]