bennychow commented on code in PR #11041:
URL: https://github.com/apache/iceberg/pull/11041#discussion_r3320692303


##########
format/view-spec.md:
##########
@@ -322,3 +449,215 @@ 
s3://bucket/warehouse/default.db/event_agg/metadata/00002-(uuid).metadata.json
   } ]
 }
 ```
+
+### Materialized View Example
+
+Imagine the following operation, which creates a materialized view that 
precomputes daily event counts:
+
+```sql
+USE prod.default
+```
+```sql
+CREATE MATERIALIZED VIEW event_agg_mv (
+    event_count COMMENT 'Count of events',
+    event_date)
+COMMENT 'Precomputed daily event counts'
+AS
+SELECT
+    COUNT(1), CAST(event_ts AS DATE)
+FROM events
+GROUP BY 2
+```
+
+The materialized view metadata JSON file looks as follows:
+
+```
+s3://bucket/warehouse/default.db/event_agg_mv/metadata/00001-(uuid).metadata.json
+```
+```json
+{
+  "view-uuid": "b2a12651-3038-4a72-8a31-5027ab84da35",
+  "format-version" : 1,
+  "location" : "s3://bucket/warehouse/default.db/event_agg_mv",
+  "current-version-id" : 1,
+  "properties" : {
+    "comment" : "Precomputed daily event counts"
+  },
+  "versions" : [ {
+    "version-id" : 1,
+    "timestamp-ms" : 1573518431292,
+    "schema-id" : 1,
+    "default-catalog" : "prod",
+    "default-namespace" : [ "default" ],
+    "summary" : {
+      "engine-name" : "Spark",
+      "engine-version" : "3.4.1"
+    },
+    "representations" : [ {
+      "type" : "sql",
+      "sql" : "SELECT\n    COUNT(1), CAST(event_ts AS DATE)\nFROM 
events\nGROUP BY 2",
+      "dialect" : "spark"
+    } ],
+    "storage-table" : {
+      "namespace" : [ "default" ],
+      "name" : "event_agg_mv__storage"
+    }
+  } ],
+  "schemas": [ {
+    "schema-id": 1,
+    "type" : "struct",
+    "fields" : [ {
+      "id" : 1,
+      "name" : "event_count",
+      "required" : false,
+      "type" : "int",
+      "doc" : "Count of events"
+    }, {
+      "id" : 2,
+      "name" : "event_date",
+      "required" : false,
+      "type" : "date"
+    } ]
+  } ],
+  "version-log" : [ {
+    "timestamp-ms" : 1573518431292,
+    "version-id" : 1
+  } ]
+}
+```
+
+After a refresh operation, the storage table's snapshot summary contains the 
`refresh-state` property.
+The following is an example of the `refresh-state` JSON value stored in the 
snapshot summary of the storage table:
+
+```json
+{
+  "view-version-id" : 1,
+  "refresh-start-timestamp-ms" : 1573518435000,
+  "source-states" : [ {
+    "type" : "table",
+    "namespace" : [ "default" ],
+    "name" : "events",
+    "uuid" : "d4a10b5c-1e8a-4b72-9d67-3f4a8c9e1b2d",
+    "snapshot-id" : 6148331192489823102
+  } ]
+}
+```
+
+## Appendix B: Example strategies for selecting dependencies
+
+Producers may select different sets of dependencies to record in the refresh 
state. The strategies below illustrate common choices against the same shared 
query.
+
+### Shared query
+
+- `A` (the materialized view being refreshed): `SELECT ... FROM B JOIN C ON 
...`
+- `B` (regular view): `SELECT ... FROM E JOIN D ON ...`
+- `C` (regular view or materialized view, varies by strategy): `SELECT ... 
FROM F JOIN G ON ...`
+- `D` (regular view or materialized view, varies by strategy): `SELECT ... 
FROM H WHERE ...`
+- `E`, `F`, `G`, `H`: base Iceberg tables
+
+### Strategy 1: Track all nested tables and views (no nested MVs)
+
+The view query reads only base tables and regular views. The refresh state 
tracks snapshot IDs of all deeply nested base tables and version IDs of all 
views traversed. Reuse of the storage table is sensitive to changes in any of 
them.
+
+`C` and `D` are regular views.

Review Comment:
   I think we should explicitly point out that the storage tables for C and D 
were not used during the refresh by the producer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to