jackye1995 commented on a change in pull request #2465:
URL: https://github.com/apache/iceberg/pull/2465#discussion_r612851229



##########
File path: api/src/main/java/org/apache/iceberg/UpdateSchema.java
##########
@@ -384,4 +384,25 @@ default UpdateSchema updateColumn(String name, 
Type.PrimitiveType newType, Strin
    *                                  with other additions, renames, or 
updates.
    */
   UpdateSchema unionByNameWith(Schema newSchema);
+
+  /**
+   * Add a new row identifier given the column name.
+   * <p>
+   * The column must be a required field of a primitive type.
+   * It must exist in the current schema to update, or is added as a part of 
this update.
+   *
+   * @param columnName name of the column to add as a row identifier
+   * @return this for method chaining
+   */
+  UpdateSchema addRowIdentifier(String columnName);

Review comment:
       > I'd use remove rather than delete because it is more clear that the 
field itself will not be deleted
   
   Sounds good, I also called it `remove` in the doc, `delete` was trying to 
match `deleteColumn`, but I agree it would cause some confusion.
   
   > What about also adding setIdentifierFields
   
   I thought about `setIdentifierFields`, the reason I did not add it was 
because the `UpdateSchema` interface seems to be all individual operations 
instead of bulk operations (except for `unionByNameWith`), and our DDL SQL 
statements are also not bulk updates. 
   
   > I think that this will tend to be idempotent
   
   I think both approaches are idempotent.  you can write it as:
   
   ```
   update.setIdentifierFields(Sets.newHashSet("profile_id")).apply();
   ```
   
   or 
   
   ```
   
update.addIdentifierField("profile_id").removeIdentifierField("account_id").apply();
   ```
   
   I am fine with both approaches. Technically if we go with 
`setIdentifierFields`, then we do not need `addIdentifierField` and 
`removeIdentifierField` and I can try to figure out the diff for every update. 
   
   @openinx do you think there is any advantage for any particular approach for 
the update API?
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to