[GitHub] [iceberg] snazy commented on pull request #3257: Bump Nessie to 0.10.1 + related changes

GitBox Thu, 14 Oct 2021 02:23:04 -0700


snazy commented on pull request #3257:
URL: https://github.com/apache/iceberg/pull/3257#issuecomment-943176314



   @rdblue Sorry for the late reply. Yes, this one changes the relationship 
between Nessie and Iceberg metadata.
   
   TL;DR the changes shall ensure that changes against the same table on 
different branches can later be merged together without having duplicate 
column-IDs or partition-IDs or the like.
   
   You're right, initially (in the "early Nessie days"), every Nessie commit 
held a pointer to the table-metadata. This works fine until you reference the 
same table on different branches and perform e.g. schema changes (think: ALTER 
TABLE ADD COLUMN) on both branches, which leads to duplicate column-ids (in 
other words: same column id used for different columns), which then can lead to 
data corruption when branches get merged.
   
   So the initial approach in this PR was to maintain table-metadata across all 
branches and "just" reference the snapshot-ID from in Nessie commits, which led 
to other issues (not explaining it here further, but schema changes became an 
issue again).
   
   The current approach is more like the initial approach: have the pointer to 
table-metadata in Nessie commits but track state that's important across all 
Nessie branches (e.g. last-column-ID) globally. It's currently implemented via 
additional functionality in TableMetadata to retrieve the "global state" 
(last-column-ID, last-used-partition-ID, last-assigned-sequence-ID) as an 
object that's opaque to Nessie plus functionality to update a TableMetadata 
using that "global state".
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] snazy commented on pull request #3257: Bump Nessie to 0.10.1 + related changes

Reply via email to