rdblue commented on pull request #2354:
URL: https://github.com/apache/iceberg/pull/2354#issuecomment-816337252


   @jackye1995, looking at this, I think that we can simplify it quite a bit 
since we are no longer tracking multiple versions of the identifier fields.
   
   I started by looking at `RowKeyIdentifierField`, which is basically an 
`Integer`, then at `RowKey`, which is basically a `Collection<Integer>` or 
`Set<Integer>`. I think that we can remove those two classes fairly easily. 
Next, this requires quite a few API changes to expose the row key along side a 
schema, when I think most SQL tables consider those part of the same thing. If 
we also consider the identifier fields to be part of the schema, then we no 
longer need so many changes.
   
   For example, we can reuse the `UpdateSchema` operation to set the schema's 
identifier fields, `setIdentifierFields(String...)`. We can also expose 
`Schema.identifierFieldIds()` to access them. Not requiring a new operation on 
`Table` or modifying `TableMetadata` is a plus. But it also makes sense to do 
this: then we could have a single operation that adds `profile_id` and also 
adds it to the table's identifier fields at the same time.
   
   Combining the identifier fields with schema also causes them to be versioned 
with schema -- we'd need to update `Schema` so that the identifier fields are 
part of equality -- but then we would be able to detect cases like @openinx 
raised. While we wouldn't roll back the schema/identifier update at the same 
time as the snapshot without a transaction, this would allow us to easily 
detect that the bad snapshot used the updated identifier, or detect which 
snapshots in a table used an updated identifier.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to