[GitHub] [iceberg] openinx commented on pull request #2354: Core: add row key to format v2

GitBox Tue, 30 Mar 2021 19:23:08 -0700


openinx commented on pull request #2354:
URL: https://github.com/apache/iceberg/pull/2354#issuecomment-810708037



   @rdblue Yes, if we only consider the records time-travel among different 
snapshots,  the old versions of `partitionSpec`, `sortOrder`, `rowIdentifier` 
are indeed useless. Maybe I'm considering a slight different case (Still use 
the same example): 
   
   `t1`: User defines the account_id as the row identifier;
   `t2`: Write few records into the table;
   `t3`: Write few equality deletions (by account_id) into table;
   `t4`: Adding profile_id to row identifier, now the identifier is account_id 
& profile_id;
   `t5`: Write few equality deletions ( by account_id & profile_id) into table;
   
   People modified the `RowIdentifier` from `account_id` to `(account_id, 
profile_id)` at timestamp `t4`, after writing few records at timestamp `t5`.  
He find that the `(account_id, profile_id)` deletions will lead to business 
error,  so he plan to revert this iceberg table to `t3` and replaying all 
deletions whose identifier are `(account_id)`. 
   
   My question is:  after reverting the table to `t3`,  should people still see 
the incorrect row identifier `(account_id, profile_id)` by default or people 
should see the correct row identifier `(account_id)` by default ?  
   
   Currently,   we implementation is the first one, that means people will need 
to  manually change the `(account_id, profile_id)` to `(account_id)` .  If we 
are very clear that we will continue to use this behavior in the future, then 
we really do not need to maintain multiple versions of the row key. Otherwise, 
maintaining multiple versions is necessary. From my understanding, the second 
behavior is actually more user-friendly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] openinx commented on pull request #2354: Core: add row key to format v2

Reply via email to