Re: [PR] Materialized View Spec [iceberg]

via GitHub Fri, 17 Oct 2025 21:33:27 -0700


yyanyy commented on code in PR #11041:
URL: https://github.com/apache/iceberg/pull/11041#discussion_r2412249832



##########
format/view-spec.md:
##########
@@ -42,12 +42,28 @@ An atomic swap of one view metadata file for another 
provides the basis for maki
 
 Writers create view metadata files optimistically, assuming that the current 
metadata location will not be changed before the writer's commit. Once a writer 
has created an update, it commits by swapping the view's metadata file pointer 
from the base location to the new location.
 
+### Materialized Views
+
+Materialized views are a type of view with precomputed results from the view 
query stored as a table.
+When queried, engines may return the precomputed data for the materialized 
views, shifting the cost of query execution to the precomputation step.
+
+Iceberg materialized views are implemented as a combination of an Iceberg view 
and an underlying Iceberg table, known as the storage table, which stores the 
precomputed data.
+The metadata for a materialized view extends the Iceberg view metadata, adding 
a pointer to the precomputed data and refresh information to determine if the 
data is still fresh. 
+The refresh information is composed of data about the so-called "source 
tables", which are the tables referenced in the query definition of the 
materialized view. 
+The storage table can be in the states of "fresh", "stale" or "invalid", which 
are determined from the following situations:
+* **fresh** -- The `snapshot_id`s of the last refresh operation match the 
current `snapshot_id`s of the source tables.
+* **stale** -- The `snapshot_id`s do not match, indicating that a refresh 
operation needs to be performed to capture the latest source table changes.

Review Comment:
   I personally don't have first hand experience working with MV by here's my 2 
cents: I'm not completely sure if we need to couple this with the lineage 
discussion, since I feel that they may serve different purposes. I think from 
the doc Steven shared, the main purpose and advantage of option 2 is to help 
engine to determine if the MV is stale or not. Obviously the exact criteria for 
determining this is engine specific, but just from a high level guess, I think 
when the use case of someone creating MV1 from MV2 and MV3 emerges, most likely 
this user/engine would expect MV1 to be refreshed based on MV2 and MV3's data 
most of the time, instead of recursively obtain the most deeply nested source 
table and start from there; otherwise it feels that there's not much point 
creating MV2 and MV3 as materialized view and use them as the children for MV1, 
normal views would be enough. Because of this, I think the fact that it's a 
nested MV makes the "materialized" part of the child views more
  interesting for engine processing. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Materialized View Spec [iceberg]

Reply via email to