I just listened to the recording. I'm the tech lead for MVs at Dremio and responsible for both refresh management and query rewrites with MVs.
It's great that we seem to agree that Iceberg MV spec won't require that MVs always be up to date in order to be usable for query rewrites. There can be many data consistency issues (as Dan pointed out) but that is the state of affairs today. It sounds like we are converging on the following scenarios for an engine to validate the MV freshness: 1. Use storage table without any validation. This might be the extreme "async MV" example. 2. Ignore storage table even if one exists because SQL command or use case requires that. 3. Use storage table only if data is not more than x hours old. This can be achieved with the proposed refresh-start-timestamp-ms which is currently in the proposed spec. For this to work with MVs built on MVs, we should probably state in the spec that if a MV is built on another MV, then it needs to inherit the refresh-start-timestamp-ms of the child MV. In Steven's example, when building mv3, refresh-start-timestamp-ms needs to be set to the minimum of mv1 or mv2's refresh-start-timestamp-ms. If this property name is confusing, we can rename it to "refresh-earliest-table-timestamp-ms". I originally proposed this property and also listed out other benefits here: https://github.com/apache/iceberg/pull/11041#discussion_r1779797796 Also, at the time, MVs built on MVs weren't being considered. Now that it is, I would recommend we have both "refresh-start-timestamp-ms" (when the refresh was started on the storage table) and "refresh-earliest-table-timestamp-ms" (used for freshness validation). 4. Don't use the storage table if it is older than X hours. This is what I had originally proposed for the *materialization.max-stalessness-ms* view property here: https://github.com/apache/iceberg/pull/11041#discussion_r1744837644 It wasn't meant to validate the freshness but more to prevent use of a materialization after some criteria. 5. Use storage table if recursive validation passes... i.e. refresh-state matches the current expanded query tree state. This is what I think Steven is calling the "synchronous MV". For scenario 1-4, it would support the nice use case of an Iceberg client using a view's data through the storage table without needing to know how to parse/validate/expand any view SQLs. In Dremio's planner, we primarily use scenario 1 and 4 together to determine MV validity for query rewrite. Scenario 2 and 5 also apply in certain situations. For scenario 3, Dremio only exposes the "refresh-earliest-table-timestamp-ms" as an fyi to the user but it would be interesting to allow the user to set this time so that they could run queries and be 100% certain that they were not seeing data older than x hours. Thanks Benny On Wed, Oct 8, 2025 at 3:37 PM Steven Wu <[email protected]> wrote: > correction for a typo. > > Prashanth brought up another scenario of compaction/rewrite where a new > snapshot was added *with* actual data change > --> > Prashanth brought up another scenario of compaction/rewrite where a new > snapshot was added *without* actual data change > > > On Wed, Oct 8, 2025 at 2:12 PM Steven Wu <[email protected]> wrote: > >> Hi, >> >> Thanks everyone for joining the MV discussion meeting. We will continue >> to have the recurring sync meeting on Wednesday 9 am (Pacific) every 3 >> weeks until we get to the finish line where Jan's MV spec PR [1] is merged. >> I have scheduled our next meeting on Oct 29 in the Iceberg dev events >> calendar. >> >> Here is the video recording for today's meeting. >> >> https://drive.google.com/file/d/1-nfhBPDWLoAFDu5cKP0rwLd_30HB6byR/view?usp=sharing >> >> We mostly discussed freshness evaluation. Here is the meeting summary. >> >> 1. For tracking the refresh state for the source MV [2], the >> consensus is option 2 (treating source MV as a materialized table) which >> would give engines the flexibility on freshness determination (recursive >> beyond source MV or not). >> 2. Earlier design doc [3] discussed max staleness config. But it >> wasn't reflected in the spec PR. The general opinion is to add the config >> to the spec PR. The open question is whether the ` >> materialization.max-staleness-ms` config should be added to the view >> metadata or the storage table metadata. Either can work. We just need to >> decide which makes a little better fit. >> 3. Prashanth brought up schema change with default value and how it >> may affect the MV refresh state (for SQL representation with select *). >> Jan >> mentioned that snapshot contains schema id when the snapshot was created. >> Engine can compare the snapshot schema id to the source table schema id >> during freshness evaluation. There is no need for additional schema info >> in >> refresh-state tracking in the storage table. >> 4. Prashanth brought up another scenario of compaction/rewrite where >> a new snapshot was added with actual data change. The general take is that >> the engine can optimize and decide that MV is fresh as the new snapshot >> doesn't have any data change. >> >> >> We can add some clarifications in the spec PR for freshness evaluation >> based on the above discussions. >> >> [1] https://github.com/apache/iceberg/pull/11041 >> [2] >> https://docs.google.com/document/d/1_StBW5hCQhumhIvgbdsHjyW0ED3dWMkjtNzyPp9Sfr8/edit?tab=t.0 >> [3] >> https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?tab=t.0#heading=h.3wigecex0zls >> >> >> >> >> On Thu, Sep 25, 2025 at 9:27 AM Steven Wu <[email protected]> wrote: >> >>> Hi all, >>> >>> Iceberg materialized view has been discussed in the community for a long >>> time. Thanks Jan Kaul for driving the discussion and the spec PR. It has >>> been stalled for a long time due to lack of consensus on 1 or 2 topics. In >>> Wed's Iceberg community sync meeting, Talat brought up the question on how >>> to move forward and if we can have a dedicated meeting for MV. >>> >>> I have set up a meeting on *Oct 8 (9-10 am Pacific)*. If you subscribe >>> to the "Iceberg Dev Events" calendar, you should be able to see it. If >>> not, here is the link: https://meet.google.com/nfe-guyq-pqf >>> >>> We are going to discuss >>> * remaining open questions >>> * unresolved concerns >>> * the next step and hopefully some consensus on moving forward >>> >>> MV spec PR is up to date. Jan has incorporated recent feedback. This >>> should be the base of the discussion. >>> https://github.com/apache/iceberg/pull/11041 >>> <https://www.google.com/url?q=https://github.com/apache/iceberg/pull/11041&sa=D&source=calendar&usd=2&usg=AOvVaw3w0TjRpwbC17AGzmxZmElM> >>> >>> Dev discussion thread (a long-running thread started by Jan). >>> https://lists.apache.org/thread/y1vlpzbn2x7xookjkffcl08zzyofk5hf >>> <https://www.google.com/url?q=https://lists.apache.org/thread/y1vlpzbn2x7xookjkffcl08zzyofk5hf&sa=D&source=calendar&usd=2&usg=AOvVaw0fotlsrnRBOb820mA5JRyB> >>> >>> The mail archive has broken lineage and doesn't show all replies. Email >>> subject is "*[DISCUSS] Iceberg Materialzied Views*". >>> >>> Thanks, >>> Steven >>> >>>
