rdblue commented on issue #2481:
URL: https://github.com/apache/iceberg/issues/2481#issuecomment-846641765


   I've talked with @rymurr about the way that Nessie currently works and I 
think we generally agree that we would want to change it to use Iceberg-native 
branching and tagging.
   
   The problem with Nessie's current model is that it keeps references to 
multiple metadata files instead of tracking everything in one place. That means:
   * We have to coordinate across metadata file versions even though Iceberg 
assumes that you don't do that: for example, that breaks the file cleanup 
assumptions because we compare the files that are reachable from all snapshots.
   * Changes that shouldn't be part of transactions may change between 
branches. For example, if you add a column in a branch and write data, you will 
have assigned a new ID and used it in a data file. If you did that in two 
branches in parallel, you'd use the same ID for two different columns. It may 
appear safe to merge the metadata trees, but it actually isn't because that 
would mix column data together.
   
   We can fix those issues with Iceberg-native branching and tagging. I think 
that's the right option for use cases where you want to branch from current 
tables for testing purposes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to