danielcweeks commented on code in PR #11041: URL: https://github.com/apache/iceberg/pull/11041#discussion_r3319257256
########## format/view-spec.md: ########## @@ -160,7 +178,116 @@ Each entry in `version-log` is a struct with the following fields: | _required_ | `timestamp-ms` | Timestamp when the view's `current-version-id` was updated (ms from epoch) | | _required_ | `version-id` | ID that `current-version-id` was set to | -## Appendix A: An Example +#### Storage Table Identifier + +The table identifier for the storage table that stores the precomputed results. + +| Requirement | Field name | Description | +|-------------|----------------|-------------| +| _required_ | `namespace` | A list of strings for namespace levels | +| _required_ | `name` | A string specifying the name of the table | + +### Storage table metadata + +This section describes additional metadata for the storage table that supplements the regular table metadata and is required for materialized views. +The `refresh-state` property is set on the [snapshot summary](https://iceberg.apache.org/spec/#snapshots) property of a storage table snapshot to provide information about the state of the precomputed data. + +| Requirement | Field name | Description | +|-------------|-----------------|-------------| +| _optional_ | `refresh-state` | A [refresh state](#refresh-state) record stored as a JSON-encoded string | + +#### Freshness + +A materialized view is **fresh** when the storage table represents the result of the current view query. + +A change to the materialized view's definition produces a new `view-version-id`; any storage-table snapshot recorded at a prior `view-version-id` is not fresh under the current definition. + +#### Refresh state + +The refresh state record captures the state of dependencies in the materialized view's dependency graph. A dependency is recorded in `source-states` as either a `table` entry (a source table or an upstream materialized view's storage table) and/or a `view` entry. Upstream materialized views can be stored as a `view` and a `table` entry. + +The refresh state has the following fields: + +| Requirement | Field name | Description | +|-------------|------------------------------|-------------| +| _required_ | `view-version-id` | The `version-id` of the materialized view when the refresh operation was performed | +| _required_ | `source-states` | A list of [source state](#source-state) records | +| _required_ | `refresh-start-timestamp-ms` | A timestamp of when the refresh operation was started | + +##### Producer: Recording Refresh State + +Producers may selectively choose a subset of their dependencies to record — for example, skipping non-Iceberg sources or recording an empty list. See [Appendix B](#appendix-b-what-counts-as-a-dependency) for strategies on how to store dependency state. + +When writing the refresh state, producers: + +- **Must** record `view-version-id` and `refresh-start-timestamp-ms`. +- **Should** include all distinct source states for the inputs they chose to track (diamond dependency pattern). +- **May** leave `source-states` empty (e.g., when sources are non-Iceberg or freshness is determined by a mechanism outside this spec). + +##### Consumer: Evaluating Refresh State + +Consumers may use any combination of the following to assess the freshness of the storage table: Review Comment: ```suggestion Consumers may use any combination of the following to assess the state of dependencies used to produce the storage table. ``` This line introduces the issue of multiple definitions of "freshness". We need to relax this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
