[GitHub] [iceberg] rymurr opened a new issue #1808: SQL extensions for Nessie catalog

GitBox Mon, 23 Nov 2020 11:28:26 -0800


rymurr opened a new issue #1808:
URL: https://github.com/apache/iceberg/issues/1808



   Currently when using Nessie and Iceberg the only way to perform operations 
on branches is to use the [CLI ](https://projectnessie.org/tools/cli/) as per 
the example in the 
[demo](https://github.com/projectnessie/nessie/tree/main/python/demo). Now that 
Iceberg supports custom SQL extensions it would be good to add some commands to 
allow manipulation of branches from Iceberg/Spark directly.
   
   The operations that we propose support for are:
   
   1. Specify the branch/context that operations should take place on. This 
would change the `reference` in an existing Nessie catalog. Most natural seems 
to be something like `USE CATALOG REFERENCE <refName> [AT <timestamp`>] where 
`refName` could be a Branch name, a Tag name or a specific Hash. The optional 
`timestamp` would allow for time travel on a branch or tag only.
   
   2. create/delete operations on branches. This would not affect the catalog 
directly but would create or delete a branch. Optionally create can switch to 
that branch (eg via 1. above). As a strawman I propose `ALTER CATALOG CREATE 
[BRANCH|TAG] <refName> [AT <hash>|<ref>]` and  `ALTER CATALOG DROP [BRANCH|TAG] 
<refName>`. There are some issues to work out around deleting the current 
catalog branch and concurrency control via the `expectedHash` in teh cli
   
   3. merge/assign/cherry-pick operations. These operations are used to move 
commits between branches. For now I would like to focus on the merge operation. 
Which would take all commits on Branch A from the HEAD to a common ancestor and 
move them to Branch B and is roughly analogous to `git merge`. Note: currently 
Nessie merge fails with no changes if there is a conflict. This could be 
expressed as `ALTER CATALOG MERGE BRANCH <refName> [<toRefName>]. Here 
`refName` is the branch we want to merge from and `toRefName` is the branch we 
want to merge onto. This is optional and the command will use the current 
branch (as set by 1.) if not provided. The cherry pick operation could be 
similar but is slightly complicated as one has to specify a contiguous set of 
hashes. As above some more thinking around concurrency control is needed.
   
   These operations would have to fail gracefully when using a catalog that 
doesn't support branching.
   
   Hopefully this is enough to start a discussion on. I would appreciate any 
feedback! If easier I can split the 3 options into 3 subtasks. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rymurr opened a new issue #1808: SQL extensions for Nessie catalog

Reply via email to