openinx commented on pull request #2354: URL: https://github.com/apache/iceberg/pull/2354#issuecomment-810708037
@rdblue Yes, if we only consider the records time-travel among different snapshots, the old versions of `partitionSpec`, `sortOrder`, `rowIdentifier` are indeed useless. Maybe I'm considering a slight different case (Still use the same example): `t1`: User defines the account_id as the row identifier; `t2`: Write few records into the table; `t3`: Write few equality deletions (by account_id) into table; `t4`: Adding profile_id to row identifier, now the identifier is account_id & profile_id; `t5`: Write few equality deletions ( by account_id & profile_id) into table; People modified the `RowIdentifier` from `account_id` to `(account_id, profile_id)` at timestamp `t4`, after writing few records at timestamp `t5`. He find that the `(account_id, profile_id)` deletions will lead to business error, so he plan to revert this iceberg table to `t3` and replaying all deletions whose identifier are `(account_id)`. My question is: after reverting the table to `t3`, should people still see the incorrect row identifier `(account_id, profile_id)` by default or people should see the correct row identifier `(account_id)` by default ? Currently, we implementation is the first one, that means people will need to manually change the `(account_id, profile_id)` to `(account_id)` . If we are very clear that we will continue to use this behavior in the future, then we really do not need to maintain multiple versions of the row key. Otherwise, maintaining multiple versions is necessary. From my understanding, the second behavior is actually more user-friendly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
