rdblue commented on pull request #2354:
URL: https://github.com/apache/iceberg/pull/2354#issuecomment-811486171


   > My question is: after reverting the table to t3, should people still see 
the incorrect row identifier (account_id, profile_id) by default or people 
should see the correct row identifier (account_id) by default ?
   
   I think I see the miscommunication. I don't think there is a way to roll 
back to t3. There is a snapshot created at t2, t3, and t5. Those snapshots are 
accessible via time travel and rollback. The rest of the table metadata is 
independent so rolling back doesn't change it. To revert both the bad write and 
the configuration change, the user should roll back and then set the row 
identifier fields to just `account_id`.
   
   Keeping table metadata and data separate (and only versioning data) is the 
right behavior, I think. Data is constantly evolving and we don't want to 
accidentally revert metadata changes -- like updating table properties -- when 
the data snapshot is rolled back.
   
   Consider a slightly different scenario where the rollback to t3 was needed 
because the source was producing bad data. Why should the `profile_id` be 
removed from the row identifier in that case? If Iceberg did that implicitly, 
then after the corrected data is turned back on, Iceberg would start deleting 
rows incorrectly using the wrong key.
   
   I think the right approach is to keep data a separate dimension. Since we 
want Iceberg to be a coordination layer between multiple services that don't 
know about one another,  I think it would be bad for actions that fix data to 
also make possibly unknown changes to metadata.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to