rdblue commented on a change in pull request #3425:
URL: https://github.com/apache/iceberg/pull/3425#discussion_r756269147
##########
File path: site/docs/spec.md
##########
@@ -566,6 +566,22 @@ Notes:
1. An alternative, *strict projection*, creates a partition predicate that
will match a file if all of the rows in the file must match the scan predicate.
These projections are used to calculate the residual predicates for each file
in a scan.
2. For example, if `file_a` has rows with `id` between 1 and 10 and a delete
file contains rows with `id` between 1 and 4, a scan for `id = 9` may ignore
the delete file because none of the deletes can match a row that will be
selected.
+#### Snapshot Reference
+
+Iceberg tables keep track of branches and tags using snapshot references.
+Tags are labels for individual snapshots. Branches are mutable named
references that can be updated by committing a new snapshot as the branch's
referenced snapshot using the [Commit Conflict Resolution and
Retry](#commit-conflict-resolution-and-retry) procedures.
+
+The snapshot reference object records all the user-defined information of a
snapshot including name, reference type and retention policy configurations.
+
+| v2 | Field name | Type | Description |
+| ---------- |------------------------------|-----------|-------------|
+| _required_ | **`snapshot-id`** | `long` | The ID of the
snapshot referenced |
+| _required_ | **`name`** | `string` | The name of the
reference, should be unique within a table. |
+| _required_ | **`type`** | `string` | Type of the
reference, `tag` or `branch` |
+| _optional_ | **`min-snapshots-to-keep`** | `int` | For `branch` type
only, the minimum number of snapshots to keep in a branch, default to the
current value of table property `history.expire.min-snapshots-to-keep` when
this value is evaluated |
+| _optional_ | **`max-snapshot-age-ms`** | `long` | The duration before
a snapshot tagged or in a branch could be expired by any automatic snapshot
expiration process, default to the current value of table property
`history.expire.max-snapshot-age-ms` when this value is evaluated |
Review comment:
I think that both of these only apply to branches because these control
the snapshots that are kept behind the branch's current snapshot.
We also need a new configuration option at the table level and at the
reference level that controls when the reference itself is removed. That is, if
the referenced snapshot is older than some interval, we remove the reference
itself. That helps prune old branches and tags that are no longer used. We'll
probably default it to keep them forever, but it is good to have the setting
for compliance needs. For that, how about `max-ref-age-ms` and
`history.expire.max-ref-age-ms`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]