jackye1995 commented on a change in pull request #3425:
URL: https://github.com/apache/iceberg/pull/3425#discussion_r756403480



##########
File path: site/docs/snapshot-tag-branch.md
##########
@@ -0,0 +1,168 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Snapshot Tagging and Branching
+
+Iceberg snapshot tagging and branching feature provides users more 
functionalities in managing Iceberg snapshot lifecycle.
+Users can assign tags to snapshots, create new branches, set the current 
branch for read and write, and configure customized retention policy for them.
+
+## Example use cases
+
+### Time-based Snapshot tagging
+
+Users can leverage Iceberg snapshot tagging to keep multiple versions of the 
table across different points in time.
+For example, a table can be configured to keep all snapshots within 24 hours, 
then 1 tagged snapshot per day, per week, per month, etc.
+The daily snapshots are retained for 1 week, weekly snapshots are retained for 
1 month, monthly snapshots are retained for 1 year, etc.
+
+### Critical snapshot maintenance branch
+
+There are snapshots that are critical for legal or business reasons, such as 
the yearly snapshots used for financial auditing.
+Because they are kept for an extended period of time (maybe even forever), 
data files in the table are commonly compacted and encrypted with periodic key 
rotation.
+Occasionally, rows in the snapshot also have to be deleted or updated to 
satisfy GDPR requirements.
+Users can create an Iceberg branch for such snapshots to maintain its 
independent lifecycle.
+
+### Experimental Branch
+
+An experimental branch is useful for many user groups, including:
+
+1. Data scientists and ML researchers can easily create an Iceberg branch to 
experiment with table data without worrying about polluting the main table 
snapshot.
+2. Data engineers can perform production AB testing against the experimental 
branch to ensure the correctness of certain table updates.
+3. Data producers can perform test load in a table in an experimental branch, 
and then append all the loaded files back to the main branch (similar to Git 
cherry-pick).
+
+!!!Note
+    Iceberg does not plan to offer a Git-like merge operation through 
branching.
+    Merging arbitrary changes requires a lot of work to keep track of the 
intent of the commit and the context. 
+    Merging in a table is actually committing a transaction. The expectation 
is different from a merge in Git, where the lack of a conflict is the 
definition of "correct". 
+    In a table, the lack of a file conflict does not mean that the transaction 
can be committed.
+    In addition, longer transaction lengths from branch-like behavior 
dramatically increases the likelihood that the transaction could fail.
+    The merge feature would likely be supported through multi-table 
transaction in the future.
+
+## Snapshot Reference
+
+In version control systems like git, branch and tag are both references of 
commits.
+In Iceberg, we use a similar concept of **Snapshot Reference** to implement 
branching and tagging.

Review comment:
       I will remove all the usages of we




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to