rdblue commented on a change in pull request #3425:
URL: https://github.com/apache/iceberg/pull/3425#discussion_r766982901
##########
File path: site/docs/spec.md
##########
@@ -566,6 +566,38 @@ Notes:
1. An alternative, *strict projection*, creates a partition predicate that
will match a file if all of the rows in the file must match the scan predicate.
These projections are used to calculate the residual predicates for each file
in a scan.
2. For example, if `file_a` has rows with `id` between 1 and 10 and a delete
file contains rows with `id` between 1 and 4, a scan for `id = 9` may ignore
the delete file because none of the deletes can match a row that will be
selected.
+#### Snapshot Reference
+
+Iceberg tables keep track of branches and tags using snapshot references.
+Tags are labels for individual snapshots. Branches are mutable named
references that can be updated by committing a new snapshot as the branch's
referenced snapshot using the [Commit Conflict Resolution and
Retry](#commit-conflict-resolution-and-retry) procedures.
+
+The snapshot reference object records all the information of a reference
including snapshot ID, reference type and [Snapshot Retention
Policy](#snapshot-retention-policy).
+
+| v2 | Field name | Type | Description |
+| ---------- |------------------------------|-----------|-------------|
+| _required_ | **`snapshot-id`** | `long` | The ID of the
snapshot referenced |
+| _required_ | **`type`** | `string` | Type of the
reference, `tag` or `branch` |
+| _optional_ | **`min-snapshots-to-keep`** | `int` | For `branch` type
only, a positive number for the minimum number of snapshots to keep in a branch
while expiring snapshots, default to the value of table property
`history.expire.min-snapshots-to-keep` when evaluated |
+| _optional_ | **`max-snapshot-age-ms`** | `long` | For `branch` type
only, a positive number for the max age of snapshots to keep in a branch while
expiring snapshots, default to the value of table property
`history.expire.max-snapshot-age-ms` when evaluated |
+| _optional_ | **`max-ref-age-ms`** | `long` | For snapshot
references except the `main` branch, a positive number for the max age of the
snapshot reference to keep while expiring snapshots, default to the value of
table property `history.expire.max-ref-age-ms` when evaluated. The `main`
branch never expires. |
+
+Valid snapshot references are stored as the values of the `refs` map in table
metadata. For serialization, see Appendix C.
+
+#### Snapshot Retention Policy
+
+Table snapshots expire and are removed from metadata to allow removed or
replaced data files to be physically deleted.
+The snapshot expiration procedure removes snapshots from table metadata and
applies the table's retention policy.
+Retention policy can be configured both globally and on snapshot reference
through properties `min-snapshots-to-keep`, `max-snapshot-age-ms` and
`max-ref-age-ms`.
+
+When expiring snapshots, retention policies in table and snapshot references
are evaluated in the following way:
+
+1. Start with an empty set of snapshots to retain
+2. Remove any refs (other than main) where the referenced snapshot is older
than `max-ref-age-ms`
+3. For each branch and tag, add the referenced snapshot to the retained set
+4. For each branch, add its ancestors to the retained set until:
+ 1. The snapshot is older than `max-snapshot-age-ms`, AND
+ 2. The snapshot is not one of the first `min-snapshots-to-keep` in the
branch (including the branch's referenced snapshot)
+5. Expire any snapshot not in the set of snapshots to retain.
Review comment:
Looks good.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]