ajantha-bhat commented on a change in pull request #3425:
URL: https://github.com/apache/iceberg/pull/3425#discussion_r739932372
##########
File path: site/docs/snapshot-tag-branch.md
##########
@@ -0,0 +1,148 @@
+# Snapshot Tagging and Branching
+
+Iceberg snapshot tagging and branching feature offers user a Git-like
experience in manging table snapshots.
+Users can assign tags to snapshots, create branches and configure customized
retention policy for them.
+
+## Example use cases
+
+### Time-based Snapshot tagging
+
+Users can leverage Iceberg snapshot tagging to keep multiple versions of the
table across different points in time.
+For example, a table can be configured to keep all snapshots within 24 hours,
then 1 tagged snapshot per day, per week, per month, etc.
+The daily snapshots are retained for 1 week, weekly snapshots are retained for
1 month, monthly snapshots are retained for 1 year, etc.
+
+### Critical snapshot maintenance branch
+
+There are snapshots that are critical for legal or business reasons, such as
the yearly snapshots used for financial auditing.
+Because they are kept for an extended period of time (maybe even forever),
data files in the table are commonly compacted and encrypted with periodic key
rotation.
+Occasionally, rows in the snapshot also have to be deleted or updated to
satisfy GDPR requirements.
+Users can create an Iceberg branch for such snapshots to maintain its
independent lifecycle.
+
+### Experimental Branch
+
+An experimental branch is useful for many user groups, including:
+
+1. Data scientists and ML researchers can easily create an Iceberg branch to
experiment with table data without worrying about polluting the main table
snapshot.
+2. Data engineers can perform production AB testing against the experimental
branch to ensure the correctness of certain table updates.
+3. Data producers can perform test load in a table in an experimental branch,
and then append all the loaded files back to the main branch (similar to Git
cherry-pick).
+
+!!!Note
+ Iceberg does not plan to offer a Git-like merge operation through
branching.
+ Merging arbitrary changes requires a lot of work to keep track of the
intent of the commit and the context.
+ Merging in a table is actually committing a transaction. The expectation
is different from a merge in Git, where the lack of a conflict is the
definition of "correct".
+ In a table, the lack of a file conflict does not mean that the transaction
can be committed.
+ In addition, longer transaction lengths from branch-like behavior
dramatically increases the likelihood that the transaction could fail.
+ The merge feature would likely be supported through multi-table
transaction in the future.
+
+## Snapshot Reference
+
+In version control systems like git, branch and tag are both references of
commits.
+In Iceberg, we use a similar concept of **Snapshot Reference** to implement
branching and tagging.
+
+Each Iceberg table metadata contains a list of `refs` (references), and a
`current-branch` indicating the current branch to use.
+When user creates an Iceberg table, the first commit belongs to the default
`main` branch.
+Each snapshot reference has a uniquely identifiable name across all references
of a table.
+A snapshot can have multiple references. The exact snapshot reference spec is
documented at the [Spec](../spec/#snapshot-reference) page.
+Here we will provide some more explanations to the concepts in snapshot
reference.
+
+### Reference Type
+
+There are clearly 2 types of snapshot reference, which are `branch` and `tag`.
Their key differences are:
+
+- **New commit**: when a new snapshot is added as a child of a referenced
snapshot, tag remains on the old snapshot, but branch reference moves to the
child.
+
+- **Retention policy**: retention policy affects all the snapshots in a
branch, but only a single tagged snapshot. (More details in the next section)
+
+### Retention Policy
+
+Iceberg offers a [snapshot expiration
procedure](../spark-procedures/#expire_snapshots) to clean up snapshots that
are not needed to free up storage space.
+Retention policy can be configured both globally and on snapshot reference to
provide highly flexible customization to the expiration behavior.
+
+#### Global snapshot retention policy
+
+Global snapshot retention policy can be set through the following table
properties:
+
+| Property | Default | Description
|
+| ------------------------------------ | ------------------ |
------------------------------------------------------------- |
+| history.expire.max-snapshot-age-ms | 432000000 (5 days) | Default max age
of snapshots to keep while expiring snapshots |
+| history.expire.min-snapshots-to-keep | 1 | Default min
number of snapshots to keep while expiring snapshots |
+
+#### Snapshot reference retention policy
+
+Similarly, snapshot reference has the properties below to provider finer grain
control:
+
+| Property | Type | Description |
+|------------------------------|-----------|-------------|
+| **`min-snapshots-to-keep`** | `int` | For `branch` type only, the
minimum number of snapshots to keep in a branch |
+| **`max-snapshot-age-ms`** | `long` | The duration before a snapshot
tagged or in a branch could be expired by any automatic snapshot expiration
process |
+
+#### Policy evaluation mechanism
+
+When a snapshot expiration process starts, it follows the steps described
below:
+
+1. form an expiration candidate pool containing all snapshots
+2. for each snapshot reference, evaluate the associated policy and move
snapshots out of the candidate pool
+3. apply global retention policy to and move snapshots out of the candidate
pool
+4. when multiple snapshots can be chosen to be moved out, newer snapshots win
+4. after evaluation, expire all snapshots that are still in the candidate pool
+
+#### Policy evaluation example
+
+Here is an example for how an Iceberg snapshot expiration procedure evaluates
what snapshots to expire.
+
+Suppose we have the following snapshot graph and retention policies configured:
+
+```
+A -> B -> C (main)
+ \ (dev)
+ D -> E (b1)
+ \
+ F -> G (b2)
+```
+
+| Policy Type | Max Age | Min to Keep | Snapshots Affected |
+|------------------|---------------|-------------|-----------------------|
+| global | 5 hours | 4 | A, B, C, D, E, F, G |
+| branch/main | 3 days | 1 | A, B, C |
+| branch/b1 | 2 days | 2 | A, D, E |
+| branch/b2 | 1 day | 0 | A, D, F, G |
+| tag/dev | forever | N/A | C |
+
+Assume that we have a process continuously running the snapshot expiration
procedure, we would have the results below as time progresses:
+
+##### Day 1: F and G are expired
+
+On day 1, the global and branch b2 max age has passed, affecting A, D, F, G.
+A andD cannot be expired due to branch b1 policy, so only F and G are expired.
Review comment:
I am also bit confused about what happens after the expiry? we delete F
& G snapshots and also drop branch b2 (we remove b2 from table metadata?)
##########
File path: site/docs/snapshot-tag-branch.md
##########
@@ -0,0 +1,148 @@
+# Snapshot Tagging and Branching
+
+Iceberg snapshot tagging and branching feature offers user a Git-like
experience in manging table snapshots.
+Users can assign tags to snapshots, create branches and configure customized
retention policy for them.
+
+## Example use cases
+
+### Time-based Snapshot tagging
+
+Users can leverage Iceberg snapshot tagging to keep multiple versions of the
table across different points in time.
+For example, a table can be configured to keep all snapshots within 24 hours,
then 1 tagged snapshot per day, per week, per month, etc.
+The daily snapshots are retained for 1 week, weekly snapshots are retained for
1 month, monthly snapshots are retained for 1 year, etc.
+
+### Critical snapshot maintenance branch
+
+There are snapshots that are critical for legal or business reasons, such as
the yearly snapshots used for financial auditing.
+Because they are kept for an extended period of time (maybe even forever),
data files in the table are commonly compacted and encrypted with periodic key
rotation.
+Occasionally, rows in the snapshot also have to be deleted or updated to
satisfy GDPR requirements.
+Users can create an Iceberg branch for such snapshots to maintain its
independent lifecycle.
+
+### Experimental Branch
+
+An experimental branch is useful for many user groups, including:
+
+1. Data scientists and ML researchers can easily create an Iceberg branch to
experiment with table data without worrying about polluting the main table
snapshot.
+2. Data engineers can perform production AB testing against the experimental
branch to ensure the correctness of certain table updates.
+3. Data producers can perform test load in a table in an experimental branch,
and then append all the loaded files back to the main branch (similar to Git
cherry-pick).
+
+!!!Note
+ Iceberg does not plan to offer a Git-like merge operation through
branching.
+ Merging arbitrary changes requires a lot of work to keep track of the
intent of the commit and the context.
+ Merging in a table is actually committing a transaction. The expectation
is different from a merge in Git, where the lack of a conflict is the
definition of "correct".
+ In a table, the lack of a file conflict does not mean that the transaction
can be committed.
+ In addition, longer transaction lengths from branch-like behavior
dramatically increases the likelihood that the transaction could fail.
+ The merge feature would likely be supported through multi-table
transaction in the future.
+
+## Snapshot Reference
+
+In version control systems like git, branch and tag are both references of
commits.
+In Iceberg, we use a similar concept of **Snapshot Reference** to implement
branching and tagging.
+
+Each Iceberg table metadata contains a list of `refs` (references), and a
`current-branch` indicating the current branch to use.
+When user creates an Iceberg table, the first commit belongs to the default
`main` branch.
+Each snapshot reference has a uniquely identifiable name across all references
of a table.
+A snapshot can have multiple references. The exact snapshot reference spec is
documented at the [Spec](../spec/#snapshot-reference) page.
+Here we will provide some more explanations to the concepts in snapshot
reference.
+
+### Reference Type
+
+There are clearly 2 types of snapshot reference, which are `branch` and `tag`.
Their key differences are:
+
+- **New commit**: when a new snapshot is added as a child of a referenced
snapshot, tag remains on the old snapshot, but branch reference moves to the
child.
+
+- **Retention policy**: retention policy affects all the snapshots in a
branch, but only a single tagged snapshot. (More details in the next section)
+
+### Retention Policy
+
+Iceberg offers a [snapshot expiration
procedure](../spark-procedures/#expire_snapshots) to clean up snapshots that
are not needed to free up storage space.
+Retention policy can be configured both globally and on snapshot reference to
provide highly flexible customization to the expiration behavior.
+
+#### Global snapshot retention policy
+
+Global snapshot retention policy can be set through the following table
properties:
+
+| Property | Default | Description
|
+| ------------------------------------ | ------------------ |
------------------------------------------------------------- |
+| history.expire.max-snapshot-age-ms | 432000000 (5 days) | Default max age
of snapshots to keep while expiring snapshots |
+| history.expire.min-snapshots-to-keep | 1 | Default min
number of snapshots to keep while expiring snapshots |
+
+#### Snapshot reference retention policy
+
+Similarly, snapshot reference has the properties below to provider finer grain
control:
+
+| Property | Type | Description |
+|------------------------------|-----------|-------------|
+| **`min-snapshots-to-keep`** | `int` | For `branch` type only, the
minimum number of snapshots to keep in a branch |
+| **`max-snapshot-age-ms`** | `long` | The duration before a snapshot
tagged or in a branch could be expired by any automatic snapshot expiration
process |
+
+#### Policy evaluation mechanism
+
+When a snapshot expiration process starts, it follows the steps described
below:
+
+1. form an expiration candidate pool containing all snapshots
+2. for each snapshot reference, evaluate the associated policy and move
snapshots out of the candidate pool
+3. apply global retention policy to and move snapshots out of the candidate
pool
+4. when multiple snapshots can be chosen to be moved out, newer snapshots win
+4. after evaluation, expire all snapshots that are still in the candidate pool
+
+#### Policy evaluation example
+
+Here is an example for how an Iceberg snapshot expiration procedure evaluates
what snapshots to expire.
+
+Suppose we have the following snapshot graph and retention policies configured:
+
+```
+A -> B -> C (main)
+ \ (dev)
+ D -> E (b1)
+ \
+ F -> G (b2)
+```
+
+| Policy Type | Max Age | Min to Keep | Snapshots Affected |
+|------------------|---------------|-------------|-----------------------|
+| global | 5 hours | 4 | A, B, C, D, E, F, G |
+| branch/main | 3 days | 1 | A, B, C |
+| branch/b1 | 2 days | 2 | A, D, E |
+| branch/b2 | 1 day | 0 | A, D, F, G |
+| tag/dev | forever | N/A | C |
+
+Assume that we have a process continuously running the snapshot expiration
procedure, we would have the results below as time progresses:
+
+##### Day 1: F and G are expired
+
+On day 1, the global and branch b2 max age has passed, affecting A, D, F, G.
+A andD cannot be expired due to branch b1 policy, so only F and G are expired.
Review comment:
```suggestion
A and D cannot be expired due to branch b1 policy, so only F and G are
expired.
```
##########
File path: site/docs/snapshot-tag-branch.md
##########
@@ -0,0 +1,148 @@
+# Snapshot Tagging and Branching
+
+Iceberg snapshot tagging and branching feature offers user a Git-like
experience in manging table snapshots.
+Users can assign tags to snapshots, create branches and configure customized
retention policy for them.
+
+## Example use cases
+
+### Time-based Snapshot tagging
+
+Users can leverage Iceberg snapshot tagging to keep multiple versions of the
table across different points in time.
+For example, a table can be configured to keep all snapshots within 24 hours,
then 1 tagged snapshot per day, per week, per month, etc.
+The daily snapshots are retained for 1 week, weekly snapshots are retained for
1 month, monthly snapshots are retained for 1 year, etc.
+
+### Critical snapshot maintenance branch
+
+There are snapshots that are critical for legal or business reasons, such as
the yearly snapshots used for financial auditing.
+Because they are kept for an extended period of time (maybe even forever),
data files in the table are commonly compacted and encrypted with periodic key
rotation.
+Occasionally, rows in the snapshot also have to be deleted or updated to
satisfy GDPR requirements.
+Users can create an Iceberg branch for such snapshots to maintain its
independent lifecycle.
+
+### Experimental Branch
+
+An experimental branch is useful for many user groups, including:
+
+1. Data scientists and ML researchers can easily create an Iceberg branch to
experiment with table data without worrying about polluting the main table
snapshot.
+2. Data engineers can perform production AB testing against the experimental
branch to ensure the correctness of certain table updates.
+3. Data producers can perform test load in a table in an experimental branch,
and then append all the loaded files back to the main branch (similar to Git
cherry-pick).
+
+!!!Note
+ Iceberg does not plan to offer a Git-like merge operation through
branching.
+ Merging arbitrary changes requires a lot of work to keep track of the
intent of the commit and the context.
+ Merging in a table is actually committing a transaction. The expectation
is different from a merge in Git, where the lack of a conflict is the
definition of "correct".
+ In a table, the lack of a file conflict does not mean that the transaction
can be committed.
+ In addition, longer transaction lengths from branch-like behavior
dramatically increases the likelihood that the transaction could fail.
+ The merge feature would likely be supported through multi-table
transaction in the future.
+
+## Snapshot Reference
+
+In version control systems like git, branch and tag are both references of
commits.
+In Iceberg, we use a similar concept of **Snapshot Reference** to implement
branching and tagging.
+
+Each Iceberg table metadata contains a list of `refs` (references), and a
`current-branch` indicating the current branch to use.
+When user creates an Iceberg table, the first commit belongs to the default
`main` branch.
+Each snapshot reference has a uniquely identifiable name across all references
of a table.
+A snapshot can have multiple references. The exact snapshot reference spec is
documented at the [Spec](../spec/#snapshot-reference) page.
+Here we will provide some more explanations to the concepts in snapshot
reference.
+
+### Reference Type
+
+There are clearly 2 types of snapshot reference, which are `branch` and `tag`.
Their key differences are:
+
+- **New commit**: when a new snapshot is added as a child of a referenced
snapshot, tag remains on the old snapshot, but branch reference moves to the
child.
+
+- **Retention policy**: retention policy affects all the snapshots in a
branch, but only a single tagged snapshot. (More details in the next section)
+
+### Retention Policy
+
+Iceberg offers a [snapshot expiration
procedure](../spark-procedures/#expire_snapshots) to clean up snapshots that
are not needed to free up storage space.
+Retention policy can be configured both globally and on snapshot reference to
provide highly flexible customization to the expiration behavior.
+
+#### Global snapshot retention policy
+
+Global snapshot retention policy can be set through the following table
properties:
+
+| Property | Default | Description
|
+| ------------------------------------ | ------------------ |
------------------------------------------------------------- |
+| history.expire.max-snapshot-age-ms | 432000000 (5 days) | Default max age
of snapshots to keep while expiring snapshots |
+| history.expire.min-snapshots-to-keep | 1 | Default min
number of snapshots to keep while expiring snapshots |
+
+#### Snapshot reference retention policy
+
+Similarly, snapshot reference has the properties below to provider finer grain
control:
+
+| Property | Type | Description |
+|------------------------------|-----------|-------------|
+| **`min-snapshots-to-keep`** | `int` | For `branch` type only, the
minimum number of snapshots to keep in a branch |
+| **`max-snapshot-age-ms`** | `long` | The duration before a snapshot
tagged or in a branch could be expired by any automatic snapshot expiration
process |
+
+#### Policy evaluation mechanism
+
+When a snapshot expiration process starts, it follows the steps described
below:
+
+1. form an expiration candidate pool containing all snapshots
+2. for each snapshot reference, evaluate the associated policy and move
snapshots out of the candidate pool
+3. apply global retention policy to and move snapshots out of the candidate
pool
+4. when multiple snapshots can be chosen to be moved out, newer snapshots win
+4. after evaluation, expire all snapshots that are still in the candidate pool
+
+#### Policy evaluation example
+
+Here is an example for how an Iceberg snapshot expiration procedure evaluates
what snapshots to expire.
+
+Suppose we have the following snapshot graph and retention policies configured:
+
+```
+A -> B -> C (main)
+ \ (dev)
+ D -> E (b1)
+ \
+ F -> G (b2)
+```
+
+| Policy Type | Max Age | Min to Keep | Snapshots Affected |
+|------------------|---------------|-------------|-----------------------|
+| global | 5 hours | 4 | A, B, C, D, E, F, G |
+| branch/main | 3 days | 1 | A, B, C |
+| branch/b1 | 2 days | 2 | A, D, E |
+| branch/b2 | 1 day | 0 | A, D, F, G |
+| tag/dev | forever | N/A | C |
+
+Assume that we have a process continuously running the snapshot expiration
procedure, we would have the results below as time progresses:
+
+##### Day 1: F and G are expired
+
+On day 1, the global and branch b2 max age has passed, affecting A, D, F, G.
+A andD cannot be expired due to branch b1 policy, so only F and G are expired.
+
+```
+A -> B -> C (main)
+ \ (dev)
+ D -> E (b1)
+```
+
+##### Day 2: no snapshot is expired
+
+On day 2, branch b1 max age has also passed, affecting A, D, E.
+Because branch b1 must keep 2 snapshots, only the oldest snapshot of the
branch A could be expired.
+However, A is also in the main branch, so it cannot be expired either.
+As a result, no snapshot is expired.
+
+```
+A -> B -> C (main)
+ \ (dev)
+ D -> E (b1)
+```
+
+##### Day 3: A is expired
+
+On day 3, main branch max age has also passed, affecting A, B, C.
+Based on the global policy, we must keep 4 snapshots, so can only expire 1
snapshot,
Review comment:
Do we really need to honour the global policy when that branch is having
local policy ? Is it better to delete A and B in this case as local policy says
min snapshots to keep is 1 ?
I think applying global policies is good enough only for snapshots that are
not involved in any branch, tag. WDYT ?
##########
File path: site/docs/snapshot-tag-branch.md
##########
@@ -0,0 +1,148 @@
+# Snapshot Tagging and Branching
+
+Iceberg snapshot tagging and branching feature offers user a Git-like
experience in manging table snapshots.
+Users can assign tags to snapshots, create branches and configure customized
retention policy for them.
+
+## Example use cases
+
+### Time-based Snapshot tagging
+
+Users can leverage Iceberg snapshot tagging to keep multiple versions of the
table across different points in time.
+For example, a table can be configured to keep all snapshots within 24 hours,
then 1 tagged snapshot per day, per week, per month, etc.
+The daily snapshots are retained for 1 week, weekly snapshots are retained for
1 month, monthly snapshots are retained for 1 year, etc.
+
+### Critical snapshot maintenance branch
+
+There are snapshots that are critical for legal or business reasons, such as
the yearly snapshots used for financial auditing.
+Because they are kept for an extended period of time (maybe even forever),
data files in the table are commonly compacted and encrypted with periodic key
rotation.
+Occasionally, rows in the snapshot also have to be deleted or updated to
satisfy GDPR requirements.
+Users can create an Iceberg branch for such snapshots to maintain its
independent lifecycle.
+
+### Experimental Branch
+
+An experimental branch is useful for many user groups, including:
+
+1. Data scientists and ML researchers can easily create an Iceberg branch to
experiment with table data without worrying about polluting the main table
snapshot.
+2. Data engineers can perform production AB testing against the experimental
branch to ensure the correctness of certain table updates.
+3. Data producers can perform test load in a table in an experimental branch,
and then append all the loaded files back to the main branch (similar to Git
cherry-pick).
+
+!!!Note
+ Iceberg does not plan to offer a Git-like merge operation through
branching.
+ Merging arbitrary changes requires a lot of work to keep track of the
intent of the commit and the context.
+ Merging in a table is actually committing a transaction. The expectation
is different from a merge in Git, where the lack of a conflict is the
definition of "correct".
+ In a table, the lack of a file conflict does not mean that the transaction
can be committed.
+ In addition, longer transaction lengths from branch-like behavior
dramatically increases the likelihood that the transaction could fail.
+ The merge feature would likely be supported through multi-table
transaction in the future.
+
+## Snapshot Reference
+
+In version control systems like git, branch and tag are both references of
commits.
+In Iceberg, we use a similar concept of **Snapshot Reference** to implement
branching and tagging.
+
+Each Iceberg table metadata contains a list of `refs` (references), and a
`current-branch` indicating the current branch to use.
+When user creates an Iceberg table, the first commit belongs to the default
`main` branch.
+Each snapshot reference has a uniquely identifiable name across all references
of a table.
+A snapshot can have multiple references. The exact snapshot reference spec is
documented at the [Spec](../spec/#snapshot-reference) page.
+Here we will provide some more explanations to the concepts in snapshot
reference.
+
+### Reference Type
+
+There are clearly 2 types of snapshot reference, which are `branch` and `tag`.
Their key differences are:
+
+- **New commit**: when a new snapshot is added as a child of a referenced
snapshot, tag remains on the old snapshot, but branch reference moves to the
child.
+
+- **Retention policy**: retention policy affects all the snapshots in a
branch, but only a single tagged snapshot. (More details in the next section)
+
+### Retention Policy
+
+Iceberg offers a [snapshot expiration
procedure](../spark-procedures/#expire_snapshots) to clean up snapshots that
are not needed to free up storage space.
+Retention policy can be configured both globally and on snapshot reference to
provide highly flexible customization to the expiration behavior.
+
+#### Global snapshot retention policy
+
+Global snapshot retention policy can be set through the following table
properties:
+
+| Property | Default | Description
|
+| ------------------------------------ | ------------------ |
------------------------------------------------------------- |
+| history.expire.max-snapshot-age-ms | 432000000 (5 days) | Default max age
of snapshots to keep while expiring snapshots |
+| history.expire.min-snapshots-to-keep | 1 | Default min
number of snapshots to keep while expiring snapshots |
+
+#### Snapshot reference retention policy
+
+Similarly, snapshot reference has the properties below to provider finer grain
control:
+
+| Property | Type | Description |
+|------------------------------|-----------|-------------|
+| **`min-snapshots-to-keep`** | `int` | For `branch` type only, the
minimum number of snapshots to keep in a branch |
+| **`max-snapshot-age-ms`** | `long` | The duration before a snapshot
tagged or in a branch could be expired by any automatic snapshot expiration
process |
+
+#### Policy evaluation mechanism
+
+When a snapshot expiration process starts, it follows the steps described
below:
+
+1. form an expiration candidate pool containing all snapshots
+2. for each snapshot reference, evaluate the associated policy and move
snapshots out of the candidate pool
+3. apply global retention policy to and move snapshots out of the candidate
pool
+4. when multiple snapshots can be chosen to be moved out, newer snapshots win
+4. after evaluation, expire all snapshots that are still in the candidate pool
+
+#### Policy evaluation example
+
+Here is an example for how an Iceberg snapshot expiration procedure evaluates
what snapshots to expire.
+
+Suppose we have the following snapshot graph and retention policies configured:
+
+```
+A -> B -> C (main)
+ \ (dev)
+ D -> E (b1)
+ \
+ F -> G (b2)
+```
+
+| Policy Type | Max Age | Min to Keep | Snapshots Affected |
+|------------------|---------------|-------------|-----------------------|
+| global | 5 hours | 4 | A, B, C, D, E, F, G |
+| branch/main | 3 days | 1 | A, B, C |
+| branch/b1 | 2 days | 2 | A, D, E |
+| branch/b2 | 1 day | 0 | A, D, F, G |
+| tag/dev | forever | N/A | C |
+
+Assume that we have a process continuously running the snapshot expiration
procedure, we would have the results below as time progresses:
+
+##### Day 1: F and G are expired
+
+On day 1, the global and branch b2 max age has passed, affecting A, D, F, G.
Review comment:
Is it better to have some snapshots in the example that are not
referenced by branches and tags just to apply global retention policy ? now all
snapshots are under branch or tag. so global policy cannot be applied. (This is
based on my assumption, please see last comment for this page)
##########
File path: site/docs/spec.md
##########
@@ -581,10 +596,11 @@ Table metadata consists of the following fields:
| _optional_ | _optional_ | **`metadata-log`**| A list (optional) of timestamp
and metadata file location pairs that encodes changes to the previous metadata
files for the table. Each time a new metadata file is created, a new entry of
the previous metadata file location should be added to the list. Tables can be
configured to remove oldest metadata log entries and keep a fixed-size log of
the most recent entries after a commit. |
| _optional_ | _required_ | **`sort-orders`**| A list of sort orders, stored
as full sort order objects. |
| _optional_ | _required_ | **`default-sort-order-id`**| Default sort order id
of the table. Note that this could be used by writers, but is not used when
reading because reads use the specs stored in manifest files. |
+| | _optional_ | **`refs`** | A list of snapshot references, stored
as full snapshot reference objects. |
+| | _optional_ | **`current-branch`** | The name of the current
branch. If not specified, it defaults to the `main` branch that starts with the
table creation commit. |
Review comment:
Do we need to keep each branch's latest snapshot id here ? I wonder how
do we support query on each branch or tag ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]