[GitHub] [iceberg] rdblue commented on a change in pull request #2465: Core: add row identifier to schema

GitBox Mon, 26 Apr 2021 15:26:49 -0700


rdblue commented on a change in pull request #2465:
URL: https://github.com/apache/iceberg/pull/2465#discussion_r620692694




##########
File path: api/src/main/java/org/apache/iceberg/Schema.java
##########
@@ -158,6 +186,33 @@ public StructType asStruct() {
     return struct.fields();
   }
 
+  /**
+   * The set of identifier field IDs.
+   * <p>
+   * Identifier is a concept similar to primary key in a relational database 
system.
+   * A row should be unique in a table based on the values of the identifier 
fields.
+   * However, unlike a primary key, Iceberg identifier differs in the 
following ways:
+   * <ul>
+   * <li>Iceberg does not enforce the uniqueness of a row based on this 
identifier information.
+   * It is used for operations like upsert to define the default upsert 
key.</li>
+   * <li>NULL can be used as value of an identifier field. Iceberg ensures 
null-safe equality check.</li>
+   * <li>A nested field in a struct can be used as an identifier. For example, 
if there is a "last_name" field
+   * inside a "user" struct in a schema, field "user.last_name" can be set as 
a part of the identifier field.</li>
+   * </ul>
+   * <p>
+   * A field can be used as a part of the identifier only if:
+   * <ul>
+   * <li>its type is primitive</li>
+   * <li>it exists in the current schema, or have been added in this 
update</li>

Review comment:
       This point doesn't make much sense here because there is no "this 
update" and the "current schema" is this schema. You could say that these field 
IDs are guaranteed to be non-repeated primitive fields in this schema.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #2465: Core: add row identifier to schema

Reply via email to