rymurr commented on a change in pull request #3425:
URL: https://github.com/apache/iceberg/pull/3425#discussion_r765196775
##########
File path: site/docs/spec.md
##########
@@ -593,16 +625,16 @@ Table metadata consists of the following fields:
| _optional_ | _required_ | **`default-spec-id`**| ID of the "current" spec
that writers should use by default. |
| _optional_ | _required_ | **`last-partition-id`**| An integer; the highest
assigned partition field ID across all partition specs for the table. This is
used to ensure partition fields are always assigned an unused ID when evolving
specs. |
| _optional_ | _optional_ | **`properties`**| A string to string map of table
properties. This is used to control settings that affect reading and writing
and is not intended to be used for arbitrary metadata. For example,
`commit.retry.num-retries` is used to control the number of commit retries. |
-| _optional_ | _optional_ | **`current-snapshot-id`**| `long` ID of the
current table snapshot. |
+| _optional_ | _optional_ | **`current-snapshot-id`**| `long` ID of the
current table snapshot; must be the same as the current ID of the `main` branch
in `refs`. |
| _optional_ | _optional_ | **`snapshots`**| A list of valid snapshots. Valid
snapshots are snapshots for which all data files exist in the file system. A
data file must not be deleted from the file system until the last snapshot in
which it was listed is garbage collected. |
| _optional_ | _optional_ | **`snapshot-log`**| A list (optional) of timestamp
and snapshot ID pairs that encodes changes to the current snapshot for the
table. Each time the current-snapshot-id is changed, a new entry should be
added with the last-updated-ms and the new current-snapshot-id. When snapshots
are expired from the list of valid snapshots, all entries before a snapshot
that has expired should be removed. |
| _optional_ | _optional_ | **`metadata-log`**| A list (optional) of timestamp
and metadata file location pairs that encodes changes to the previous metadata
files for the table. Each time a new metadata file is created, a new entry of
the previous metadata file location should be added to the list. Tables can be
configured to remove oldest metadata log entries and keep a fixed-size log of
the most recent entries after a commit. |
| _optional_ | _required_ | **`sort-orders`**| A list of sort orders, stored
as full sort order objects. |
| _optional_ | _required_ | **`default-sort-order-id`**| Default sort order id
of the table. Note that this could be used by writers, but is not used when
reading because reads use the specs stored in manifest files. |
+| | _optional_ | **`refs`** | A map of snapshot references. The map
keys are the unique snapshot reference names in the table, and the map values
are snapshot reference objects. There is always a `main` branch reference
pointing to the `current-snapshot-id` even if the `refs` map is null. |
Review comment:
I still feel uncomfortable about a transaction to add a tag but I don't
see any easy way out of it. I think it would be good to have a discussion about
metadata.json as the source of truth (for everything) in the longer term as I
think that is becoming less feasible. My comment on the call today about
notification settings living in metadata is related.
I guess my only question is if we agree that its awkward and we agree that
catalogs have more of a role to play in teh future then how do we move on from
this proposal? Its hard to back this out of the spec or evolve it once its in.
I know this is a hard question to reason about and I don't want to hold this
useful feature up. But it would be good to at least think about it given the
above discussion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]