[GitHub] [iceberg] rdblue commented on a change in pull request #2465: Core: add row identifier to schema

GitBox Tue, 20 Apr 2021 17:48:37 -0700


rdblue commented on a change in pull request #2465:
URL: https://github.com/apache/iceberg/pull/2465#discussion_r617127118




##########
File path: core/src/main/java/org/apache/iceberg/SchemaUpdate.java
##########
@@ -317,6 +320,31 @@ public UpdateSchema unionByNameWith(Schema newSchema) {
     return this;
   }
 
+  @Override
+  public UpdateSchema setIdentifierFields(Set<String> names) {

Review comment:
       I'm rethinking this; sorry for the churn.
   
   We should actually be able to do this with a single string and no parent 
name. When `add` or `set` is called, the column must already exist. We ensure 
that each name is unique: `a.b` is either a field named `a.b` or is a field 
named `b` within field `a`, but never both. Because of that, the name here is 
never ambiguous. We will always be able to look it up. So supporting 
`Collection<String>` or `String...` should work as long as we track two things: 
existing identifier fields as ids and new identifier fields as names. Then when 
we create the identifier list later we can look up the names.
   
   Does that sound reasonable?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #2465: Core: add row identifier to schema

Reply via email to