jackye1995 commented on a change in pull request #3425:
URL: https://github.com/apache/iceberg/pull/3425#discussion_r755722797
##########
File path: site/docs/snapshot-tag-branch.md
##########
@@ -0,0 +1,148 @@
+# Snapshot Tagging and Branching
+
+Iceberg snapshot tagging and branching feature offers user a Git-like
experience in manging table snapshots.
+Users can assign tags to snapshots, create branches and configure customized
retention policy for them.
+
+## Example use cases
+
+### Time-based Snapshot tagging
+
+Users can leverage Iceberg snapshot tagging to keep multiple versions of the
table across different points in time.
+For example, a table can be configured to keep all snapshots within 24 hours,
then 1 tagged snapshot per day, per week, per month, etc.
+The daily snapshots are retained for 1 week, weekly snapshots are retained for
1 month, monthly snapshots are retained for 1 year, etc.
+
+### Critical snapshot maintenance branch
+
+There are snapshots that are critical for legal or business reasons, such as
the yearly snapshots used for financial auditing.
+Because they are kept for an extended period of time (maybe even forever),
data files in the table are commonly compacted and encrypted with periodic key
rotation.
+Occasionally, rows in the snapshot also have to be deleted or updated to
satisfy GDPR requirements.
+Users can create an Iceberg branch for such snapshots to maintain its
independent lifecycle.
+
+### Experimental Branch
+
+An experimental branch is useful for many user groups, including:
+
+1. Data scientists and ML researchers can easily create an Iceberg branch to
experiment with table data without worrying about polluting the main table
snapshot.
+2. Data engineers can perform production AB testing against the experimental
branch to ensure the correctness of certain table updates.
+3. Data producers can perform test load in a table in an experimental branch,
and then append all the loaded files back to the main branch (similar to Git
cherry-pick).
Review comment:
I would imagine it to be similar to the Spark procedure
`cherrypick_snapshot`. It does not have any requirement, you need to know
exactly what you are doing to do the cherry-pick.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]