bennychow commented on code in PR #11041:
URL: https://github.com/apache/iceberg/pull/11041#discussion_r2627854951


##########
format/view-spec.md:
##########
@@ -160,6 +177,89 @@ Each entry in `version-log` is a struct with the following 
fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+#### Storage Table Identifier
+
+The table identifier for the storage table that stores the precomputed results.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _required_  | `namespace`    | A list of strings for namespace levels |
+| _required_  | `name`         | A string specifying the name of the table |
+
+### Storage table metadata
+
+This section describes additional metadata for the storage table that 
supplements the regular table metadata and is required for materialized views.
+The property "refresh-state" is set on the [snapshot 
summary](https://iceberg.apache.org/spec/#snapshots) property of every storage 
table snapshot to determine the freshness of the precomputed data of the 
storage table.
+
+| Requirement | Field name      | Description |
+|-------------|-----------------|-------------|
+| _required_  | `refresh-state` | A [refresh state](#refresh-state) record 
stored as a JSON-encoded string |
+
+#### Freshness
+
+A materialized view is considered fresh when its precomputed data is usable by 
consumers. As tables referenced by a materialized view change over time, the 
precomputed data may no longer accurately reflect the logical materialized view 
definition. When this occurs, the materialized view (storage table) is 
considered stale.
+
+Different systems interpret freshness differently, typically based on the 
objects referenced in the fully expanded query tree of the materialized view. 
Some systems consider only direct children, others only leaf nodes, and some 
the entire query tree. The specific interpretation is determined by the 
producer of the storage table.
+
+#### Refresh state
+
+The refresh state record captures the state of source tables, views, and 
materialized views at refresh time. It contains a list of directly or 
indirectly referenced source states that allow a consumer to determine the 
freshness of the materialized view.

Review Comment:
   Suggestion:
   
   The refresh state record captures the **unique dependencies in the 
materialized view's dependency graph**.  These dependencies include source 
Iceberg tables, views, and **nested** materialized views that allow a consumer 
to determine the freshness of the materialized view.
   
   



##########
format/view-spec.md:
##########
@@ -160,6 +177,89 @@ Each entry in `version-log` is a struct with the following 
fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+#### Storage Table Identifier
+
+The table identifier for the storage table that stores the precomputed results.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _required_  | `namespace`    | A list of strings for namespace levels |
+| _required_  | `name`         | A string specifying the name of the table |
+
+### Storage table metadata
+
+This section describes additional metadata for the storage table that 
supplements the regular table metadata and is required for materialized views.
+The property "refresh-state" is set on the [snapshot 
summary](https://iceberg.apache.org/spec/#snapshots) property of every storage 
table snapshot to determine the freshness of the precomputed data of the 
storage table.
+
+| Requirement | Field name      | Description |
+|-------------|-----------------|-------------|
+| _required_  | `refresh-state` | A [refresh state](#refresh-state) record 
stored as a JSON-encoded string |
+
+#### Freshness
+
+A materialized view is considered fresh when its precomputed data is usable by 
consumers. As tables referenced by a materialized view change over time, the 
precomputed data may no longer accurately reflect the logical materialized view 
definition. When this occurs, the materialized view (storage table) is 
considered stale.
+
+Different systems interpret freshness differently, typically based on the 
objects referenced in the fully expanded query tree of the materialized view. 
Some systems consider only direct children, others only leaf nodes, and some 
the entire query tree. The specific interpretation is determined by the 
producer of the storage table.

Review Comment:
   Suggestion:
   
   Different systems define freshness differently, **based on how much of the 
dependency graph must be current**. Some require the entire query tree to be 
fully up to date, while others only require direct children or allow bounded 
staleness at leaf nodes. As a result, “fresh” can mean strict end-to-end 
consistency, acceptable lag, or policy/version compliance.



##########
format/view-spec.md:
##########
@@ -42,12 +42,24 @@ An atomic swap of one view metadata file for another 
provides the basis for maki
 
 Writers create view metadata files optimistically, assuming that the current 
metadata location will not be changed before the writer's commit. Once a writer 
has created an update, it commits by swapping the view's metadata file pointer 
from the base location to the new location.
 
+### Materialized Views
+
+Materialized views are a type of view with precomputed results from the view 
query stored as a table.
+When queried, engines may return the precomputed data for the materialized 
views, shifting the cost of query execution to the precomputation step.
+
+Iceberg materialized views are implemented as a combination of an Iceberg view 
and an underlying Iceberg table, the "storage-table", which stores the 
precomputed data.
+Materialized View metadata is a superset of View metadata with an additional 
pointer to the storage table. The storage table is an Iceberg table with 
additional materialized view refresh state metadata.
+Refresh metadata contains information about the "source tables" and/or "source 
views", which are the tables/views referenced in the query definition of the 
materialized view.
+
 ## Specification
 
 ### Terms
 
 * **Schema** -- Names and types of fields in a view.
 * **Version** -- The state of a view at some point in time.
+* **Storage table** -- Iceberg table that stores the precomputed data of a 
materialized view.
+* **Source table** -- A table reference that occurs in the query definition of 
a materialized view. The materialized view depends on the data from the source 
tables.
+* **Source view** -- A view reference that occurs in the query definition of a 
materialized view. The materialized view depends on the definitions from the 
source views.

Review Comment:
   Suggestion:
   
   Add this additional term:
   
   **Nested materialized view** -- A dependent materialized view that is used 
in refreshing the current materialized view.



##########
format/view-spec.md:
##########
@@ -160,6 +177,89 @@ Each entry in `version-log` is a struct with the following 
fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+#### Storage Table Identifier
+
+The table identifier for the storage table that stores the precomputed results.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _required_  | `namespace`    | A list of strings for namespace levels |
+| _required_  | `name`         | A string specifying the name of the table |
+
+### Storage table metadata
+
+This section describes additional metadata for the storage table that 
supplements the regular table metadata and is required for materialized views.
+The property "refresh-state" is set on the [snapshot 
summary](https://iceberg.apache.org/spec/#snapshots) property of every storage 
table snapshot to determine the freshness of the precomputed data of the 
storage table.
+
+| Requirement | Field name      | Description |
+|-------------|-----------------|-------------|
+| _required_  | `refresh-state` | A [refresh state](#refresh-state) record 
stored as a JSON-encoded string |
+
+#### Freshness
+
+A materialized view is considered fresh when its precomputed data is usable by 
consumers. As tables referenced by a materialized view change over time, the 
precomputed data may no longer accurately reflect the logical materialized view 
definition. When this occurs, the materialized view (storage table) is 
considered stale.

Review Comment:
   Suggestion:
   
   A materialized view is considered fresh when its precomputed data is usable 
by consumers. As tables **and views** referenced by a materialized view change 
over time, the precomputed data may no longer accurately **reflect the 
materialized view's dependency graph**. When this occurs, the materialized view 
(storage table) is considered stale.



##########
format/view-spec.md:
##########
@@ -160,6 +177,89 @@ Each entry in `version-log` is a struct with the following 
fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+#### Storage Table Identifier
+
+The table identifier for the storage table that stores the precomputed results.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _required_  | `namespace`    | A list of strings for namespace levels |
+| _required_  | `name`         | A string specifying the name of the table |
+
+### Storage table metadata
+
+This section describes additional metadata for the storage table that 
supplements the regular table metadata and is required for materialized views.
+The property "refresh-state" is set on the [snapshot 
summary](https://iceberg.apache.org/spec/#snapshots) property of every storage 
table snapshot to determine the freshness of the precomputed data of the 
storage table.
+
+| Requirement | Field name      | Description |
+|-------------|-----------------|-------------|
+| _required_  | `refresh-state` | A [refresh state](#refresh-state) record 
stored as a JSON-encoded string |
+
+#### Freshness
+
+A materialized view is considered fresh when its precomputed data is usable by 
consumers. As tables referenced by a materialized view change over time, the 
precomputed data may no longer accurately reflect the logical materialized view 
definition. When this occurs, the materialized view (storage table) is 
considered stale.
+
+Different systems interpret freshness differently, typically based on the 
objects referenced in the fully expanded query tree of the materialized view. 
Some systems consider only direct children, others only leaf nodes, and some 
the entire query tree. The specific interpretation is determined by the 
producer of the storage table.
+
+#### Refresh state
+
+The refresh state record captures the state of source tables, views, and 
materialized views at refresh time. It contains a list of directly or 
indirectly referenced source states that allow a consumer to determine the 
freshness of the materialized view.
+
+**Producer responsibilities:**
+- The producer of the storage table must provide a sufficient list of source 
states so that consumers can determine freshness according to the producer's 
interpretation.
+- The source states list may be empty if the source state cannot be determined 
for all objects (for example, for non-Iceberg tables).
+
+**Consumer evaluation:**
+- The consumer must at least perform a coarse-grained evaluation based on 
`refresh-start-timestamp-ms` and `max-staleness-ms`.
+- The consumer may additionally compare the `source-states` list against the 
states loaded from the catalog.
+- The consumer trusts that the producer has provided all states necessary to 
determine freshness.
+
+The refresh state has the following fields:
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _required_  | `view-version-id`         | The `version-id` of the 
materialized view when the refresh operation was performed  |
+| _required_  | `source-states`        | A list of [source 
states](#source-state) records |
+| _required_  | `refresh-start-timestamp-ms` | A timestamp of when the refresh 
operation was started |
+
+#### Source state
+
+Materialized views can reference source objects of different types, such as 
Iceberg tables and views. Source state records have a common field `type` that 
determines the form, which can be one of the following:
+
+* `table`: An Iceberg table
+* `view`: An Iceberg view

Review Comment:
   Discussion:
   
   Could we make it easier for the consumer to know whether a producer used a 
nested MV or not?  The producer could separate out the nested MV with an 
additional type:
   
   - materialized-view:  An Iceberg materialized view used to refresh the 
current storage table
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to