rdblue commented on pull request #2354: URL: https://github.com/apache/iceberg/pull/2354#issuecomment-816337252
@jackye1995, looking at this, I think that we can simplify it quite a bit since we are no longer tracking multiple versions of the identifier fields. I started by looking at `RowKeyIdentifierField`, which is basically an `Integer`, then at `RowKey`, which is basically a `Collection<Integer>` or `Set<Integer>`. I think that we can remove those two classes fairly easily. Next, this requires quite a few API changes to expose the row key along side a schema, when I think most SQL tables consider those part of the same thing. If we also consider the identifier fields to be part of the schema, then we no longer need so many changes. For example, we can reuse the `UpdateSchema` operation to set the schema's identifier fields, `setIdentifierFields(String...)`. We can also expose `Schema.identifierFieldIds()` to access them. Not requiring a new operation on `Table` or modifying `TableMetadata` is a plus. But it also makes sense to do this: then we could have a single operation that adds `profile_id` and also adds it to the table's identifier fields at the same time. Combining the identifier fields with schema also causes them to be versioned with schema -- we'd need to update `Schema` so that the identifier fields are part of equality -- but then we would be able to detect cases like @openinx raised. While we wouldn't roll back the schema/identifier update at the same time as the snapshot without a transaction, this would allow us to easily detect that the bad snapshot used the updated identifier, or detect which snapshots in a table used an updated identifier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
