rdblue commented on a change in pull request #3425:
URL: https://github.com/apache/iceberg/pull/3425#discussion_r766982901



##########
File path: site/docs/spec.md
##########
@@ -566,6 +566,38 @@ Notes:
 1. An alternative, *strict projection*, creates a partition predicate that 
will match a file if all of the rows in the file must match the scan predicate. 
These projections are used to calculate the residual predicates for each file 
in a scan.
 2. For example, if `file_a` has rows with `id` between 1 and 10 and a delete 
file contains rows with `id` between 1 and 4, a scan for `id = 9` may ignore 
the delete file because none of the deletes can match a row that will be 
selected.
 
+#### Snapshot Reference
+
+Iceberg tables keep track of branches and tags using snapshot references. 
+Tags are labels for individual snapshots. Branches are mutable named 
references that can be updated by committing a new snapshot as the branch's 
referenced snapshot using the [Commit Conflict Resolution and 
Retry](#commit-conflict-resolution-and-retry) procedures.
+
+The snapshot reference object records all the information of a reference 
including snapshot ID, reference type and [Snapshot Retention 
Policy](#snapshot-retention-policy).
+
+| v2         | Field name                   | Type      | Description |
+| ---------- |------------------------------|-----------|-------------|
+| _required_ | **`snapshot-id`**            | `long`    | The ID of the 
snapshot referenced |
+| _required_ | **`type`**                   | `string`  | Type of the 
reference, `tag` or `branch` |
+| _optional_ | **`min-snapshots-to-keep`**  | `int`     | For `branch` type 
only, a positive number for the minimum number of snapshots to keep in a branch 
while expiring snapshots, default to the value of table property 
`history.expire.min-snapshots-to-keep` when evaluated |
+| _optional_ | **`max-snapshot-age-ms`**    | `long`    | For `branch` type 
only, a positive number for the max age of snapshots to keep in a branch while 
expiring snapshots, default to the value of table property 
`history.expire.max-snapshot-age-ms` when evaluated |
+| _optional_ | **`max-ref-age-ms`**         | `long`    | For snapshot 
references except the `main` branch, a positive number for the max age of the 
snapshot reference to keep while expiring snapshots, default to the value of 
table property `history.expire.max-ref-age-ms` when evaluated. The `main` 
branch never expires. |
+
+Valid snapshot references are stored as the values of the `refs` map in table 
metadata. For serialization, see Appendix C.
+
+#### Snapshot Retention Policy
+
+Table snapshots expire and are removed from metadata to allow removed or 
replaced data files to be physically deleted.
+The snapshot expiration procedure removes snapshots from table metadata and 
applies the table's retention policy.
+Retention policy can be configured both globally and on snapshot reference 
through properties `min-snapshots-to-keep`, `max-snapshot-age-ms` and 
`max-ref-age-ms`.
+
+When expiring snapshots, retention policies in table and snapshot references 
are evaluated in the following way:
+
+1. Start with an empty set of snapshots to retain
+2. Remove any refs (other than main) where the referenced snapshot is older 
than `max-ref-age-ms`
+3. For each branch and tag, add the referenced snapshot to the retained set
+4. For each branch, add its ancestors to the retained set until:
+    1. The snapshot is older than `max-snapshot-age-ms`, AND
+    2. The snapshot is not one of the first `min-snapshots-to-keep` in the 
branch (including the branch's referenced snapshot)
+5. Expire any snapshot not in the set of snapshots to retain.

Review comment:
       Looks good.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to