rdblue commented on issue #2481: URL: https://github.com/apache/iceberg/issues/2481#issuecomment-846641765
I've talked with @rymurr about the way that Nessie currently works and I think we generally agree that we would want to change it to use Iceberg-native branching and tagging. The problem with Nessie's current model is that it keeps references to multiple metadata files instead of tracking everything in one place. That means: * We have to coordinate across metadata file versions even though Iceberg assumes that you don't do that: for example, that breaks the file cleanup assumptions because we compare the files that are reachable from all snapshots. * Changes that shouldn't be part of transactions may change between branches. For example, if you add a column in a branch and write data, you will have assigned a new ID and used it in a data file. If you did that in two branches in parallel, you'd use the same ID for two different columns. It may appear safe to merge the metadata trees, but it actually isn't because that would mix column data together. We can fix those issues with Iceberg-native branching and tagging. I think that's the right option for use cases where you want to branch from current tables for testing purposes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
