rymurr opened a new issue #1808: URL: https://github.com/apache/iceberg/issues/1808
Currently when using Nessie and Iceberg the only way to perform operations on branches is to use the [CLI ](https://projectnessie.org/tools/cli/) as per the example in the [demo](https://github.com/projectnessie/nessie/tree/main/python/demo). Now that Iceberg supports custom SQL extensions it would be good to add some commands to allow manipulation of branches from Iceberg/Spark directly. The operations that we propose support for are: 1. Specify the branch/context that operations should take place on. This would change the `reference` in an existing Nessie catalog. Most natural seems to be something like `USE CATALOG REFERENCE <refName> [AT <timestamp`>] where `refName` could be a Branch name, a Tag name or a specific Hash. The optional `timestamp` would allow for time travel on a branch or tag only. 2. create/delete operations on branches. This would not affect the catalog directly but would create or delete a branch. Optionally create can switch to that branch (eg via 1. above). As a strawman I propose `ALTER CATALOG CREATE [BRANCH|TAG] <refName> [AT <hash>|<ref>]` and `ALTER CATALOG DROP [BRANCH|TAG] <refName>`. There are some issues to work out around deleting the current catalog branch and concurrency control via the `expectedHash` in teh cli 3. merge/assign/cherry-pick operations. These operations are used to move commits between branches. For now I would like to focus on the merge operation. Which would take all commits on Branch A from the HEAD to a common ancestor and move them to Branch B and is roughly analogous to `git merge`. Note: currently Nessie merge fails with no changes if there is a conflict. This could be expressed as `ALTER CATALOG MERGE BRANCH <refName> [<toRefName>]. Here `refName` is the branch we want to merge from and `toRefName` is the branch we want to merge onto. This is optional and the command will use the current branch (as set by 1.) if not provided. The cherry pick operation could be similar but is slightly complicated as one has to specify a contiguous set of hashes. As above some more thinking around concurrency control is needed. These operations would have to fail gracefully when using a catalog that doesn't support branching. Hopefully this is enough to start a discussion on. I would appreciate any feedback! If easier I can split the 3 options into 3 subtasks. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
