JanKaul commented on code in PR #11041: URL: https://github.com/apache/iceberg/pull/11041#discussion_r2631557404
########## format/view-spec.md: ########## @@ -160,6 +178,108 @@ Each entry in `version-log` is a struct with the following fields: | _required_ | `timestamp-ms` | Timestamp when the view's `current-version-id` was updated (ms from epoch) | | _required_ | `version-id` | ID that `current-version-id` was set to | +#### Storage Table Identifier + +The table identifier for the storage table that stores the precomputed results. + +| Requirement | Field name | Description | +|-------------|----------------|-------------| +| _required_ | `namespace` | A list of strings for namespace levels | +| _required_ | `name` | A string specifying the name of the table | + +### Storage table metadata + +This section describes additional metadata for the storage table that supplements the regular table metadata and is required for materialized views. +The property "refresh-state" is set on the [snapshot summary](https://iceberg.apache.org/spec/#snapshots) property of every storage table snapshot to determine the freshness of the precomputed data of the storage table. + +| Requirement | Field name | Description | +|-------------|-----------------|-------------| +| _required_ | `refresh-state` | A [refresh state](#refresh-state) record stored as a JSON-encoded string | + +#### Freshness + +A materialized view's precomputed data becomes stale as the tables and views referenced in its query definition change over time. Freshness determines whether the precomputed data accurately represents the logical query definition at the current state of its dependencies. + +Different systems define freshness differently, based on how much of the dependency graph must be current. Some require the entire query tree to be fully up to date, while others only require direct children or allow bounded staleness at leaf nodes. As a result, "fresh" can mean strict end-to-end consistency, acceptable lag, or policy/version compliance. + +A materialized view is considered fresh when its precomputed data meets the freshness criteria defined by the consumer's evaluation policy. When these criteria are not met, the materialized view is considered stale. + +#### Refresh state + +The refresh state record captures the unique dependencies in the materialized view's dependency graph. These dependencies include source Iceberg tables, views, and nested materialized views that allow a consumer to determine the freshness of the materialized view. + +**Producer responsibilities:** +- The producer of the storage table must provide a sufficient list of source states so that consumers can determine freshness according to the producer's interpretation. +- The source states list may be empty if the source state cannot be determined for all objects (for example, for non-Iceberg tables). Review Comment: I think the intent to always treat the data as fresh should be specified with `max-staleness-ms` set to `null`. If that is what you mean with doesn't require staleness determination. The issue is that with an emtpy list the behavior for "cannot be determined" and "doesn't need to be determined" is different. In case of "cannot be determined" the consumer should use the coarse-grained check. But with hte "doesn't need to be determined" the consumer should just treat the data as fresh. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
