[GitHub] [iceberg] rdblue commented on a change in pull request #2465: Core: add row identifier to schema

GitBox Sat, 17 Apr 2021 15:46:31 -0700


rdblue commented on a change in pull request #2465:
URL: https://github.com/apache/iceberg/pull/2465#discussion_r615313057




##########
File path: core/src/main/java/org/apache/iceberg/SchemaUpdate.java
##########
@@ -317,6 +333,38 @@ public UpdateSchema unionByNameWith(Schema newSchema) {
     return this;
   }
 
+  @Override
+  public UpdateSchema addIdentifierField(String name) {
+    Types.NestedField field = schema.findField(name);
+    if (field == null) {
+      field = adds.get(TABLE_ROOT_ID).stream()
+          .filter(f -> f.name().equals(name))
+          .findAny().orElse(null);
+    }
+
+    Preconditions.checkArgument(field != null,
+        "Cannot find column of the given name in existing or newly added 
columns: %s", name);
+    Preconditions.checkArgument(field.isRequired(),
+        "Cannot add column %s as an identifier field because it is not a 
required field", name);

Review comment:
       I don't think that this requirement is correct. In the case where we're 
adding a column and then making it part of the identifier, values for all of 
the existing data will be null. I don't see a reason why we can't consider, for 
example, `(1, null)` to be an identifier.
   
   The strongest argument for making this a required field is SQL's boolean 
logic, where `null` is not equal to itself. But in Iceberg, we translate those 
expressions so that we can use normal boolean logic throughout the library. 
Equality deletes consider null equal to itself in Iceberg, just like set 
operations in SQL use null-safe equality.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #2465: Core: add row identifier to schema

Reply via email to