igorbelianski-cyber commented on code in PR #11041:
URL: https://github.com/apache/iceberg/pull/11041#discussion_r3319009331


##########
format/view-spec.md:
##########
@@ -160,7 +178,116 @@ Each entry in `version-log` is a struct with the 
following fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
-## Appendix A: An Example
+#### Storage Table Identifier
+
+The table identifier for the storage table that stores the precomputed results.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _required_  | `namespace`    | A list of strings for namespace levels |
+| _required_  | `name`         | A string specifying the name of the table |
+
+### Storage table metadata
+
+This section describes additional metadata for the storage table that 
supplements the regular table metadata and is required for materialized views.
+The `refresh-state` property is set on the [snapshot 
summary](https://iceberg.apache.org/spec/#snapshots) property of a storage 
table snapshot to provide information about the state of the precomputed data.
+
+| Requirement | Field name      | Description |
+|-------------|-----------------|-------------|
+| _optional_  | `refresh-state` | A [refresh state](#refresh-state) record 
stored as a JSON-encoded string |
+
+#### Freshness
+
+A materialized view is **fresh** when the storage table represents the result 
of the current view query.
+
+A change to the materialized view's definition produces a new 
`view-version-id`; any storage-table snapshot recorded at a prior 
`view-version-id` is not fresh under the current definition.
+
+#### Refresh state
+
+The refresh state record captures the state of dependencies in the 
materialized view's dependency graph. A dependency is recorded in 
`source-states` as either a `table` entry (a source table or an upstream 
materialized view's storage table) and/or a `view` entry. Upstream materialized 
views can be stored as a `view` and a `table` entry.
+
+The refresh state has the following fields:
+
+| Requirement | Field name                   | Description |
+|-------------|------------------------------|-------------|
+| _required_  | `view-version-id`            | The `version-id` of the 
materialized view when the refresh operation was performed |
+| _required_  | `source-states`              | A list of [source 
state](#source-state) records |
+| _required_  | `refresh-start-timestamp-ms` | A timestamp of when the refresh 
operation was started |
+
+##### Producer: Recording Refresh State
+
+Producers may selectively choose a subset of their dependencies to record — 
for example, skipping non-Iceberg sources or recording an empty list. See 
[Appendix B](#appendix-b-what-counts-as-a-dependency) for strategies on how to 
store dependency state.
+
+When writing the refresh state, producers:
+
+- **Must** record `view-version-id` and `refresh-start-timestamp-ms`.
+- **Should** include all distinct source states for the inputs they chose to 
track (diamond dependency pattern).
+- **May** leave `source-states` empty (e.g., when sources are non-Iceberg or 
freshness is determined by a mechanism outside this spec).
+
+##### Consumer: Evaluating Refresh State
+
+Consumers may use any combination of the following to assess the freshness of 
the storage table:
+
+- **Recency policy.** Accept the storage table when 
`refresh-start-timestamp-ms` falls within a staleness window. A recency policy 
bounds data age but does not establish freshness.
+- **Trust the recorded `source-states`.** Compare each entry against the 
current catalog state — `snapshot-id` for tables, `version-id` for views, 
optionally recursive verification for upstream materialized views recorded by 
their storage tables. Also confirm that the recorded `view-version-id` equals 
the materialized view's current `view-version-id`.
+- **Verify by parsing the view query.** Derive the dependency set from the SQL 
and confirm every dependency is covered by `source-states` and matches the 
current state. Treat any uncovered dependency as undetermined.
+
+If a consumer's assessment passes, it reads from the storage table. If not, 
the consumer may fail the query, evaluate the view query directly, or apply 
another strategy.
+
+#### Source state
+
+Source state records capture the state of objects referenced by a materialized 
view. Each record has a `type` field that determines its form:
+
+| Type    | Description |
+|---------|-------------|
+| `table` | An Iceberg table — either a source table in the dependency graph, 
or the storage table of an upstream materialized view |
+| `view`  | An Iceberg view in the dependency graph |
+
+An upstream materialized view may be recorded as a `view` entry referencing 
its view metadata and one ore more `table` entries referencing its storage 
table or other source tables. These source table entries might be determined by 
recursively expanding its own dependencies.

Review Comment:
   nit: "Recording an upstream materialized view may be recorded as its view 
metadata and/or its storage tables and/or any other tables discovered by 
recursively expanding the view’s dependencies."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to