I just left a long comment on the issue. Thanks for picking this up, Jan!

My main comments echoed what Walaa was talking about here. I also offered a
different solution, which is to keep the MV metadata in snapshot
properties in the table. That makes things smaller and I think it also
addresses some of the concerns for commits. There should be no need to
update view metadata for this (other than to point to materialized
versions) since we can detect when a materialized copy is up to date.

On Thu, Dec 22, 2022 at 1:10 AM Jan Kaul <jank...@mailbox.org.invalid>
wrote:

> Hi Walaa,
>
> as you pointed out, design 1 in the github issue
> <https://github.com/apache/iceberg/issues/6420> with a common view and a
> linked storage table seems to be the most promising going forward. I
> therefore put together an initial proposal for a specification.
>
> I realize that my proposal deviates from the design Trino is using at the
> moment. I want to point out that this is just a proposal and is open for
> discussion. The changes I suggest are twofold:
>
>    1. Change
>
> Register only the view in the iceberg catalog. Currently the common view
> as well as the storage table are registered in the catalog. To assure
> atomic transaction for the storage table I propose the following commit
> procedure:
>
> Writers changing the storage table state create table metadata files
> optimistically, assuming that the current metadata location will not be
> changed
> before the writer’s commit. The commit is performed in two steps. First,
> the
> Writer creates a new view metadata file optimistically and changes the
> storage
> table pointer to the new location. Second, the new view metadata file gets
> committed by swapping the view’s metadata file pointer in the metastore
> from the
> base location to its new location. The commit is only considered
> successful when the second
> step succeeds.
>
>    1. Change
>
> A different format for storing the refresh information. The proposed
> metadata captures more information and allows easier additions in the
> future.
>
> I have summarized the changes in the proposal at the end of the email. I
> would be glad if you could have a look.
>
> Best wishes,
>
> Jan
> Design 1 Overview
>
> MVs (Materialized views) are realized as a combination of an iceberg
> common view
> with an underlying storage table. The definition of the materialized view
> is
> stored in the common view. The precomputed data is stored in an iceberg
> table
> called storage table. The information required for refresh operations is
> stored
> as a property in the storage table. All changes to either the view or the
> storage table state create a new view metadata file and completely replace
> the
> old view metadata file using an atomic swap. Like Iceberg tables and
> views, this
> atomic swap is delegated to the metastore that tracks tables and views by
> name.
> Metadata Location
>
> An atomic swap of one view metadata file for another provides the basis for
> making atomic changes. Readers use the version of the view that was
> current when
> they loaded the view metadata and are not affected by changes until they
> refresh
> and pick up a new metadata location.
>
> Writers distinguish between changing the view or the storage table state.
>
> Writers changing the view state create view metadata files optimistically,
> assuming that the current metadata location will not be changed before the
> writer’s commit. Once a writer has created an update, it commits by
> swapping the
> view’s metadata file pointer from the base location to the new location.
>
> Writers changing the storage table state create table metadata files
> optimistically, assuming that the current metadata location will not be
> changed
> before the writer’s commit. The commit is performed in two steps. First,
> the
> Writer creates a new view metadata file optimistically and changes the
> storage
> table pointer to the new location. Second, the new view metadata file gets
> committed by swapping the view’s metadata file pointer in the metastore
> from the
> base location to its new location. The commit is only successful when the
> second
> step succeeds.
> Specification (DRAFT!)
>
> The metadata of the materialized view is comprised of four parts. The view
> and
> the storage table metadata constitute one part each. Since not all
> information
> can be stored inside the view and storage table metadata, two additional
> parts
> are introduced in the properties field of the view and storage table
> metadata
> respectively.
> Materialized view metadata stored in the common view properties
>
> One part of the materialized view metadata is stored inside the properties
> field of the common view. The metadata is stored in JSON format under the
> key
> “materialized_view_metadata”. The materialized view metadata stored in the
> view
> has the following schema.
> v1 Field Name Description
> *required* *storage-table-location* Path to the metadata file of the
> storage table.
> *optional* *allow-stale-data* Boolean that defines the query engine
> behavior in case the base tables indicate the precomputed data isn’t fresh.
> If set to FALSE, a refresh operation has to be performed before the query
> results are returned. If set to TRUE the data in the storage table gets
> returned without performing a refresh operation. If field is not set,
> defaults to FALSE.
> *optional* *refresh-strategy* Possible values are: full: Full storage
> table refresh, incremental: Incremental table refresh. If field is not
> set, defaults to full Materialized view metadata stored in the storage
> table properties
>
> Another part of the materialized view metadata is stored inside the
> properties
> field of the storage table. The metadata is stored in JSON format under
> the key
> “materialized_view_metadata”. The materialized view metadata stored in the
> storage table has the following schema.
> v1 Field Name Description
> *required* *refreshes* A list of refresh operations.
> *required* *current-refresh-id* Id of the last refresh operation that
> defines the current state of the data files. *Refreshes*
>
> Refresh information is stored as a list of refresh operation records. Each
> refresh operation has the following structure:
> v1 Field Name Description
> *required* *refresh-id* ID of the refresh operation.
> *required* *version-id* Version id of the materialized view when the
> refresh operation was performed.
> *required* *base-tables* A List of base-table records.
> *optional* *sequence-number* Sequence number of the snapshot that
> contains the refreshed data files.
>
> Refreshes could be handled in different ways. For a normal execution the
> refresh
> list could consist of only one entry, which gets overwritted on every
> refresh
> operation. If “timetravel” is enabled for the materialized view, a new
> refresh operation record gets inserted into the list on every refresh.
> Together with the sequence-number field, this could be used to track the
> evolution of data files over the refresh history.
> *Base table*
>
> A base table record can have different forms based on the common field
> “type”.
> The other fields don’t necessarily have to be the same.
> Iceberg-Metastore
> v1 Field Name Description
> *required* *type* type=”iceberg-metastore”
> *required* *identifier* Identifier in the SQL expression.
> *required* *snapshot-reference* Snapshot id of the base table when the
> refresh operation was performed.
> *optional* *properties* A string to string map of base table properties.
> Could be used to specify a different metastore. Iceberg-FileSystem
> v1 Field Name Description
> *required* *type* type=”iceberg-filesystem”
> *required* *identifier* Identifier in the SQL expression.
> *required* *location* Path to the directory of the base table.
> *required* *snapshot-reference* Snapshot id of the base table when the
> refresh operation was performed.
> *optional* *properties* A string to string map of base table properties.
> Could be used for a different storage system. DeltaLake-FileSystem
> (optional)
> v1 Field Name Description
> *required* *type* type=”deltalake-filesystem”
> *required* *identifier* Identifier in the SQL expression.
> *required* *location* Path to the directory of the base table.
> *required* *snapshot-reference* Delta table version of the base table
> when the refresh operation was performed.
> *optional* *properties* A string to string map of base table properties.
> Could be used for a different storage system.
>


-- 
Ryan Blue
Tabular

Reply via email to