rdblue commented on code in PR #4945:
URL: https://github.com/apache/iceberg/pull/4945#discussion_r908918332
##########
format/spec.md:
##########
@@ -486,16 +486,17 @@ When reading v1 manifests with no sequence number column,
sequence numbers for a
A snapshot consists of the following fields:
-| v1 | v2 | Field | Description |
-| ---------- | ---------- | ------------------------ | ----------- |
-| _required_ | _required_ | **`snapshot-id`** | A unique long ID |
-| _optional_ | _optional_ | **`parent-snapshot-id`** | The snapshot ID of the
snapshot's parent. Omitted for any snapshot with no parent |
-| | _required_ | **`sequence-number`** | A monotonically
increasing long that tracks the order of changes to a table |
-| _required_ | _required_ | **`timestamp-ms`** | A timestamp when the
snapshot was created, used for garbage collection and table inspection |
-| _optional_ | _required_ | **`manifest-list`** | The location of a
manifest list for this snapshot that tracks manifest files with additional
metadata |
-| _optional_ | | **`manifests`** | A list of manifest file
locations. Must be omitted if `manifest-list` is present |
-| _optional_ | _required_ | **`summary`** | A string map that
summarizes the snapshot changes, including `operation` (see below) |
-| _optional_ | _optional_ | **`schema-id`** | ID of the table's
current schema when the snapshot was created |
+| v1 | v2 | Field | Description
|
+| ---------- | ---------- | ------------------------
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| _required_ | _required_ | **`snapshot-id`** | A unique long ID
|
+| _optional_ | _optional_ | **`parent-snapshot-id`** | The snapshot ID of the
snapshot's parent. Omitted for any snapshot with no parent
|
+| | _required_ | **`sequence-number`** | A monotonically
increasing long that tracks the order of changes to a table
|
+| _required_ | _required_ | **`timestamp-ms`** | A timestamp when the
snapshot was created, used for garbage collection and table inspection
|
+| _optional_ | _required_ | **`manifest-list`** | The location of a
manifest list for this snapshot that tracks manifest files with additional
metadata |
+| _optional_ | | **`manifests`** | A list of manifest file
locations. Must be omitted if `manifest-list` is present
|
+| _optional_ | _required_ | **`summary`** | A string map that
summarizes the snapshot changes, including `operation` (see below)
|
+| _optional_ | _optional_ | **`schema-id`** | ID of the table's
current schema when the snapshot was created
|
+| _optional_ | _optional_ | **`statistics`** | A [statistics file's
metadata](#statistics-file). The field should be retained by writers, unless
writer updates the statistics, or knows they became obsolete. |
Review Comment:
> Are you suggesting we have a list of stats files, each of which is
associated with a single snapshot file. Then we we traverse our ancestors in
the snapshot tree to find stats files?
In most cases, yes. That's effectively the same as carrying the last
snapshot's stats file forward. But we can also add stats after the fact by
analyzing a particular snapshot and adding the new file to the stats list.
> When we don't have a stats file for the current snapshot (or snapshot in
question) do we return no stats files?
I think we have the API find the "nearest" stats file and then let the
caller determine what to do. Someone can also request all stats files with
metadata and implement their own way of choosing the right stats file.
@RussellSpitzer, your example makes sense. I think that the right approach
is to have the ability to return either the stats for a particular snapshot (by
simple lookup) or by finding the "nearest" through an ancestor. That allows the
caller to choose.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]