igorbelianski-cyber commented on code in PR #11041:
URL: https://github.com/apache/iceberg/pull/11041#discussion_r2631483271


##########
format/view-spec.md:
##########
@@ -160,6 +178,108 @@ Each entry in `version-log` is a struct with the 
following fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+#### Storage Table Identifier
+
+The table identifier for the storage table that stores the precomputed results.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _required_  | `namespace`    | A list of strings for namespace levels |
+| _required_  | `name`         | A string specifying the name of the table |
+
+### Storage table metadata
+
+This section describes additional metadata for the storage table that 
supplements the regular table metadata and is required for materialized views.
+The property "refresh-state" is set on the [snapshot 
summary](https://iceberg.apache.org/spec/#snapshots) property of every storage 
table snapshot to determine the freshness of the precomputed data of the 
storage table.
+
+| Requirement | Field name      | Description |
+|-------------|-----------------|-------------|
+| _required_  | `refresh-state` | A [refresh state](#refresh-state) record 
stored as a JSON-encoded string |
+
+#### Freshness
+
+A materialized view's precomputed data becomes stale as the tables and views 
referenced in its query definition change over time. Freshness determines 
whether the precomputed data accurately represents the logical query definition 
at the current state of its dependencies.
+
+Different systems define freshness differently, based on how much of the 
dependency graph must be current. Some require the entire query tree to be 
fully up to date, while others only require direct children or allow bounded 
staleness at leaf nodes. As a result, "fresh" can mean strict end-to-end 
consistency, acceptable lag, or policy/version compliance.
+
+A materialized view is considered fresh when its precomputed data meets the 
freshness criteria defined by the consumer's evaluation policy. When these 
criteria are not met, the materialized view is considered stale.
+
+#### Refresh state
+
+The refresh state record captures the unique dependencies in the materialized 
view's dependency graph. These dependencies include source Iceberg tables, 
views, and nested materialized views that allow a consumer to determine the 
freshness of the materialized view.
+
+**Producer responsibilities:**
+- The producer of the storage table must provide a sufficient list of source 
states so that consumers can determine freshness according to the producer's 
interpretation.
+- The source states list may be empty if the source state cannot be determined 
for all objects (for example, for non-Iceberg tables).
+
+**Consumer evaluation:**
+- The consumer must at least perform a coarse-grained evaluation based on 
`refresh-start-timestamp-ms` and `max-staleness-ms`. A materialized view is 
fresh if `refresh-start-timestamp-ms` is within the window `[now - 
max-staleness-ms, now]`.
+- The consumer may additionally compare the `source-states` list against the 
states loaded from the catalog. If this evaluation determines the materialized 
view is fresh, it overrides the coarse-grained evaluation result.
+- The consumer may parse the view definition to implement a more sophisticated 
policy.
+- When a materialized view is considered stale, the consumer can fail, refresh 
inline, or treat the materialized view as a logical view. The consumer must not 
consume from the storage table when the materialized view is stale.

Review Comment:
   must not consume from the storage table when the materialized view is stale.
   we discussed a lot of false negatives (view is fresh but consumer can not 
reliably detect it )  
   
   may be something like : 
    must not consume from the storage table when the materialized view doesn't 
meet freshness criteria. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to