Walaa, I think Ryan's comment was more in relation to the combined metadata approach (which is not the current proposal). I don't think anything that's being discussed at this point helps or hurts the view integration story with catalog apis.
-Dan On Mon, May 20, 2024 at 1:21 PM Walaa Eldin Moustafa <wa.moust...@gmail.com> wrote: > Thanks Dan! Do you share the concern of having to update engine APIs as > well before adopting this? Also Ryan had a concern [1] on the number of > backends that need to be updated. > > [1] > https://docs.google.com/document/d/1zg0wQ5bVKTckf7-K_cdwF4mlRi6sixLcyEh6jErpGYY/edit?pli=1#heading=h.rbxigxsh4rfw > > Thanks, > Walaa. > > > On Mon, May 20, 2024 at 11:14 AM Daniel Weeks <dwe...@apache.org> wrote: > >> I know I'm coming in late here, but I'm still working through all the >> prior discussion. Here are my thoughts so far: >> >> I agree that Jan's doc has some really good context and we should >> continue from there, but we should remove discussion of options as it just >> creates confusion. We can reference other material from previous >> discussions (mailing lists, other docs/prior version, etc), but the focus >> should be on what we feel the final product will look like as opposed to >> the path it took to get there. Let's make sure that the proposal issue >> <https://github.com/apache/iceberg/issues/10043> is kept up to date and >> is the primary reference point going forward. >> >> As to the question of using properties or updating the view metadata, I >> agree with Szehon and others who expressed concern about using properties. >> There are a number of issues I see with standardizing around properties in >> the table or view definition. The first is that some of the structures may >> get complicated. I'm specifically concerned about lineage information as >> there can be many table references and possibly even view references that >> will need to contain additional information. The second issue is that >> table and view properties persist across versions, so rollback and even >> handling of those properties would require careful implementation. The >> third issue is that once you've standardized on properties, it is difficult >> to change direction and any future changes will likely need to be handled >> through properties as well. This creates an awkward scenario where you're >> building a specification around something that isn't particularly well >> defined or structured. >> >> I also don't feel there's a significant difference in the effort involved >> in updating the metadata representations because we need to define the same >> properties regardless (most of the work is around the handling, not how the >> values are defined). While using properties may seem expedient, I don't >> think it's really going to move things along much faster and will be more >> difficult in the long-term. The one caveat is that I believe we may be >> able to limit the changes to the view spec and avoid needing a table spec >> update. I've added my comments to the doc to that effect. >> >> -Dan >> >> >> >> On Fri, May 17, 2024 at 3:16 PM Walaa Eldin Moustafa < >> wa.moust...@gmail.com> wrote: >> >>> I am not in favor of expanding the spec for use cases that do not >>> directly serve materialized views. Identifying general lineage is a >>> separate problem that is also applicable to non-materialized views so maybe >>> that’s worth discussing in a separate spec. If there is a use case for >>> timestamp or snapshot level properties for materialize views, we can >>> discuss them but so far I feel they are redundant. What do you think? >>> >>> Thanks, >>> Walaa. >>> >>> On Fri, May 17, 2024 at 3:05 PM Benny Chow <btc...@gmail.com> wrote: >>> >>>> I think it’s still worthwhile to include the snapshot and timestamp >>>> refs for completeness sake. >>>> >>>> Also, Jan brought up interesting use case with BI tool using the MV >>>> without SQL representation. The BI tool can get all table and view >>>> dependencies if the lineage is complete. >>>> >>>> Thanks >>>> >>>> >>>> On May 17, 2024, at 1:35 PM, Walaa Eldin Moustafa < >>>> wa.moust...@gmail.com> wrote: >>>> >>>> >>>> >>>> Sounds good. I am assuming we agree it is not required for either >>>> snapshot or timestamp? >>>> >>>> Thanks, >>>> Walaa. >>>> >>>> >>>> On Fri, May 17, 2024 at 1:17 PM Benny Chow <btc...@gmail.com> wrote: >>>> >>>>> I like Jack's suggestions to capture the ref type and value! When the >>>>> ref type is branch, the snapshot id is dynamic and so the engine using the >>>>> MV can validate that the latest snapshot on a branch matches the branch >>>>> snapshot at the time of materialization. >>>>> >>>>> I think if we do this then we don't need to precisely identify the >>>>> same table (at different snapshots) in the MV's query tree. So, we don't >>>>> need to capture any additional properties like alias, parent view, path to >>>>> root, sequence number, etc. >>>>> >>>>> Thanks >>>>> Benny >>>>> >>>>> On Fri, May 17, 2024 at 11:20 AM Walaa Eldin Moustafa < >>>>> wa.moust...@gmail.com> wrote: >>>>> >>>>>> Thanks Jack, and welcome back! >>>>>> >>>>>> Taking a step back, I understand the initial concern was that a table >>>>>> name (e.g., t1 in your example) would be referenced multiple times in the >>>>>> view definition and each reference is associated with a different >>>>>> snapshot >>>>>> ID, hence UUID is not sufficient to capture each occurrence/reference. I >>>>>> proposed: >>>>>> >>>>>> * The solution to track unique occurrences is to use something along >>>>>> the lines of the SQL alias (e.g., "t1" for the first occurrence and "t2" >>>>>> from "as t2" in your example) to uniquely identify each occurrence -- we >>>>>> can tweak the representation and explore how to handle this in case of >>>>>> nested queries, etc, but alias is the main concept to track uniqueness. >>>>>> * However, since this leads to a series of open ended problems, I >>>>>> have also suggested avoiding this complexity altogether and not >>>>>> supporting >>>>>> time travel in MVs for now. >>>>>> >>>>>> However, thinking again, are not time travel queries in MVs >>>>>> self-containing the exact snapshot ID that we are trying to track in the >>>>>> properties? Looks like this information is already encoded in the query >>>>>> and >>>>>> there is no need to capture it externally. >>>>>> >>>>>> For example, if the MV definition consists of table references where >>>>>> all of the references are bound to specific snapshot IDs or timestamps, >>>>>> then the storage table is always fresh no matter if the underlying tables >>>>>> change. Tracking snapshot IDs in the storage table is only required for >>>>>> table references that are not pinned to a specific snapshot ID/timestamp >>>>>> in >>>>>> the view definition, for which UUID is sufficient. >>>>>> >>>>>> Thanks, >>>>>> Walaa. >>>>>> >>>>>> >>>>>> On Fri, May 17, 2024 at 9:51 AM Jack Ye <yezhao...@gmail.com> wrote: >>>>>> >>>>>>> Hi everyone, just want to say I am back from the leave, and >>>>>>> currently catching up with the threads, I will make more comments later >>>>>>> after knowing more details of what has been going on. Looks like we've >>>>>>> made >>>>>>> great progress! >>>>>>> >>>>>>> Just my 2 cents on the current properties vs metadata field >>>>>>> discussion. The proposed properties are: >>>>>>> - in view: >>>>>>> 1. a boolean flag to indicate a view is a MV >>>>>>> 2. a pointer to the storage table >>>>>>> - in storage table: >>>>>>> 3. view version that is materialized >>>>>>> 4. a prefix-based map to describe the snapshot version of the base >>>>>>> tables that are materialized >>>>>>> 5. a prefix-based map to describe the version of child views that >>>>>>> are materialized >>>>>>> >>>>>>> For 1, 2, and 3, these are all pretty simple and can be just >>>>>>> properties. I guess 4 and 5 are the main ones that seem complex and can >>>>>>> be >>>>>>> more formalized as metadata fields. I think the time travel cases Bunny >>>>>>> brought up might be good ones to look into more details: >>>>>>> >>>>>>> For direct version travel, I think the base table version serves as >>>>>>> the default. If you have a MV query like >>>>>>> >>>>>>> SELECT * FROM >>>>>>> t1, >>>>>>> t1 FOR SYSTEM_VERSION AS OF 987654 as t2 >>>>>>> WHERE t1.c1 = t2.c1 >>>>>>> >>>>>>> and in the storage table it says t1 maps to snapshot id 123456, then >>>>>>> the query is still not ambiguous, it should be interpreted as >>>>>>> >>>>>>> SELECT * FROM >>>>>>> t1 FOR SYSTEM_VERSION AS OF 123456, >>>>>>> t1 FOR SYSTEM_VERSION AS OF 987654 as t2 >>>>>>> WHERE t1.c1 = t2.c1 >>>>>>> >>>>>>> For ref travel, the specific ref version needs to be fixed at MV >>>>>>> creation time: >>>>>>> >>>>>>> SELECT * FROM >>>>>>> t1, >>>>>>> t1 FOR SYSTEM_VERSION AS OF '2024-Q1' as t2 >>>>>>> WHERE t1.c1 = t2.c1 >>>>>>> >>>>>>> Just storing table UUID is not sufficient. In a property-based >>>>>>> approach, we need something like >>>>>>> base.table.<table>.ref.<ref-name>=<snapshot-id>. >>>>>>> >>>>>>> Time travel is similar to ref travel: >>>>>>> >>>>>>> SELECT * FROM >>>>>>> t1, >>>>>>> t1 FOR SYSTEM_TIME AS OF timestamp '2024-01-01' as t2 >>>>>>> WHERE t1.c1 = t2.c1 >>>>>>> >>>>>>> In a property-based approach, we need something like >>>>>>> base.table.<table>.time.<timestamp>=<snapshot-id>. >>>>>>> >>>>>>> Technically this is indeed getting increasingly complex, so I can >>>>>>> get why many of us say this property-based approach is quite brittle. >>>>>>> However, it seems like it can still work as we extend the property >>>>>>> structure. Personally speaking I am leaning more towards the >>>>>>> property-based >>>>>>> approach just for its simplicity, but I need to think more about other >>>>>>> use >>>>>>> cases as well. >>>>>>> >>>>>>> Best, >>>>>>> Jack Ye >>>>>>> >>>>>>> >>>>>>> On Thu, May 16, 2024 at 10:21 PM Walaa Eldin Moustafa < >>>>>>> wa.moust...@gmail.com> wrote: >>>>>>> >>>>>>>> I think this is orthogonal to the property vs metadata field since >>>>>>>> instead of representing the property as `base.table.[UUID]` it would be >>>>>>>> something like `base.table.[alias]` where `alias` is the specific >>>>>>>> occurrence of the table in the query according to its alias (and SELECT >>>>>>>> scope possibly, which kind of opens the door to further complexities, >>>>>>>> but >>>>>>>> for the sake of argument -- there is a mapping to properties too). >>>>>>>> >>>>>>>> Another question: assuming we go with the top level metadata model, >>>>>>>> will we still expose this metadata on the engine side as properties? >>>>>>>> What >>>>>>>> would the property names be? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Walaa. >>>>>>>> >>>>>>>> >>>>>>>> On Thu, May 16, 2024 at 9:55 PM Benny Chow <btc...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Sounds good. >>>>>>>>> >>>>>>>>> Another benefit of the struct model is that it's more extensible >>>>>>>>> in the future when we need to disambiguate the same table that appears >>>>>>>>> multiple times in the MV query tree. >>>>>>>>> This could happen with time travel queries or branching. We may >>>>>>>>> end up adding additional properties like a sequence number, parent >>>>>>>>> view or >>>>>>>>> path to root. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> On Thu, May 16, 2024 at 3:57 PM Walaa Eldin Moustafa < >>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Benny, I have responded to the comment. >>>>>>>>>> >>>>>>>>>> I would suggest that we use this thread to evaluate properties >>>>>>>>>> model vs top level metadata model (to avoid discussion drift). >>>>>>>>>> >>>>>>>>>> If we have feedback on the actual properties used in the >>>>>>>>>> properties model as defined in the PR, we can have the discussion >>>>>>>>>> there. >>>>>>>>>> >>>>>>>>>> THanks, >>>>>>>>>> Walaa. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, May 16, 2024 at 3:22 PM Benny Chow <btc...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Walaa >>>>>>>>>>> >>>>>>>>>>> I left comments in your spec PR: >>>>>>>>>>> https://github.com/apache/iceberg/pull/10280#pullrequestreview-2061922169 >>>>>>>>>>> My last question about use cases was really about incremental >>>>>>>>>>> refresh with >>>>>>>>>>> aggregates. But I think this might be too complicated to try to >>>>>>>>>>> model/discuss now and so I agree with Micah's comment about doing >>>>>>>>>>> it in a >>>>>>>>>>> future iteration. >>>>>>>>>>> >>>>>>>>>>> Hi Jan, >>>>>>>>>>> >>>>>>>>>>> Regarding storing the identifiers, I like the idea too. >>>>>>>>>>> Dremio's query engine supports MVs on sources besides Iceberg >>>>>>>>>>> tables. >>>>>>>>>>> Here's everything that's in a single lineage entry: >>>>>>>>>>> https://github.com/dremio/dremio-oss/blob/master/services/accelerator/src/main/protobuf/reflection.proto#L80 >>>>>>>>>>> The lineage is stored as a graph and not a list of entries. I >>>>>>>>>>> think for >>>>>>>>>>> what we are trying to achieve, it's more practical to limit the >>>>>>>>>>> lineage to >>>>>>>>>>> Iceberg sources. >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> Benny >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, May 15, 2024 at 12:06 AM Jan Kaul >>>>>>>>>>> <jank...@mailbox.org.invalid> wrote: >>>>>>>>>>> >>>>>>>>>>>> I agree with Szehon and Benny that storing the lineage >>>>>>>>>>>> information as multiple table properties is too brittle, >>>>>>>>>>>> especially for >>>>>>>>>>>> many source tables (base tables). I would prefer to have the whole >>>>>>>>>>>> lineage >>>>>>>>>>>> information in one entry as it is more concise. This is also how >>>>>>>>>>>> Trino has >>>>>>>>>>>> been doing it, as you can see here >>>>>>>>>>>> <https://github.com/trinodb/trino/blob/212455d3e1d393f58cbc395d2b9da47ed8f23dd8/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java#L2915> >>>>>>>>>>>> . >>>>>>>>>>>> >>>>>>>>>>>> As we've discussed in the google doc >>>>>>>>>>>> <https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit#heading=h.60qmzug7bzxc>: >>>>>>>>>>>> it is helpful to also store the table identifiers of the source >>>>>>>>>>>> tables to >>>>>>>>>>>> enable clients to determine the freshness of the MV that don't >>>>>>>>>>>> understand >>>>>>>>>>>> the SQL dialect of the MV definition, like other query engines, BI >>>>>>>>>>>> tools >>>>>>>>>>>> and Dataframe libraries. This is also how Trino is doing it. >>>>>>>>>>>> That's why we >>>>>>>>>>>> chose the design in the google doc. >>>>>>>>>>>> >>>>>>>>>>>> Storing the storage table identifier as a property might work. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, Jan >>>>>>>>>>>> On 15.05.24 02:38, Walaa Eldin Moustafa wrote: >>>>>>>>>>>> >>>>>>>>>>>> Thanks Benny. My specific thoughts about the spec and the >>>>>>>>>>>> properties are captured in the spec PR >>>>>>>>>>>> https://github.com/apache/iceberg/pull/10280. The spec is also >>>>>>>>>>>> implemented in the Spark implementation PR >>>>>>>>>>>> https://github.com/apache/iceberg/pull/9830, and I believe >>>>>>>>>>>> this follows the same nature of how the information was captured in >>>>>>>>>>>> Netflix's implementation with Spark, and Trino implementation >>>>>>>>>>>> (prior to >>>>>>>>>>>> formalizing through that spec), both of which have been used >>>>>>>>>>>> reliably for >>>>>>>>>>>> years. I think it also aligns with Ryan's feedback here >>>>>>>>>>>> https://github.com/apache/iceberg/issues/6420#issuecomment-1369280546 >>>>>>>>>>>> which >>>>>>>>>>>> indicated the usage of properties. >>>>>>>>>>>> >>>>>>>>>>>> The reasons for choosing properties: >>>>>>>>>>>> * Not every table is a storage table and not every view is a >>>>>>>>>>>> materialized view. I feel exposing the info as top level metadata >>>>>>>>>>>> is an >>>>>>>>>>>> overkill for the original object type. >>>>>>>>>>>> * The properties are simple. They contain either single >>>>>>>>>>>> snapshot ID each, or single view version each, or lastly, the >>>>>>>>>>>> storage table >>>>>>>>>>>> identifier. Engines can use them without issues (also as shown in >>>>>>>>>>>> the >>>>>>>>>>>> implementation). >>>>>>>>>>>> * To be meaningful, the metadata fields should be captured in >>>>>>>>>>>> the engine API as well, which is an effort that has to be pursued >>>>>>>>>>>> outside >>>>>>>>>>>> the Iceberg community. Until engine APIs evolve, we would have to >>>>>>>>>>>> define a >>>>>>>>>>>> mapping between Iceberg metadata fields and engine properties >>>>>>>>>>>> (only current >>>>>>>>>>>> place in engine side to capture this info). This requires an >>>>>>>>>>>> additional >>>>>>>>>>>> spec on its own, and it will introduce complexities. Hence it is >>>>>>>>>>>> always >>>>>>>>>>>> cleaner to map Iceberg properties to engine properties and Iceberg >>>>>>>>>>>> metadata >>>>>>>>>>>> to designated engine APIs. Mixing and matching (e.g., Iceberg >>>>>>>>>>>> metadata >>>>>>>>>>>> fields as engine properties) just for the lack of other cleaner >>>>>>>>>>>> options >>>>>>>>>>>> does not sound like a good idea in both short and long term. >>>>>>>>>>>> >>>>>>>>>>>> Let me know your thoughts. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Walaa. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, May 14, 2024 at 5:12 PM Benny Chow <btc...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I agree with Szheon here. I think storing the materialization >>>>>>>>>>>>> lineage as a bunch of properties is brittle. This lineage >>>>>>>>>>>>> information is >>>>>>>>>>>>> needed by engines to validate the staleness of a materialization >>>>>>>>>>>>> and also >>>>>>>>>>>>> to perform full or incremental refreshes. There’s a lot to >>>>>>>>>>>>> capture here. >>>>>>>>>>>>> >>>>>>>>>>>>> Maybe we should drill down into the use cases first - such as >>>>>>>>>>>>> incremental refresh with aggregates? (Pick a harder one first 😀) >>>>>>>>>>>>> >>>>>>>>>>>>> I left a few comments about this in the doc. I wonder what >>>>>>>>>>>>> are your thoughts here Walaa? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> >>>>>>>>>>>>> On May 14, 2024, at 4:20 PM, Walaa Eldin Moustafa < >>>>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks John. The current metadata does not sound complex. We >>>>>>>>>>>>> need to track the underlying table snapshot IDs as well as the >>>>>>>>>>>>> view version >>>>>>>>>>>>> ID. I agree as long as it is simple and before this feature fully >>>>>>>>>>>>> matures, >>>>>>>>>>>>> we should track it in properties. >>>>>>>>>>>>> >>>>>>>>>>>>> One important factor for me (apart from the API effort, >>>>>>>>>>>>> especially on the engine side), is that not each table is an MV >>>>>>>>>>>>> storage >>>>>>>>>>>>> table. Surfacing MV-specific storage table properties as first >>>>>>>>>>>>> class table >>>>>>>>>>>>> metadata sounds to impose this metadata on every table, when it >>>>>>>>>>>>> is not >>>>>>>>>>>>> required for normal table operation (yes, they can be optional, >>>>>>>>>>>>> but it does >>>>>>>>>>>>> not sound like the use case warrants exposing them as metadata >>>>>>>>>>>>> fields yet). >>>>>>>>>>>>> >>>>>>>>>>>>> Similarly, since not every view is a materialized view, it >>>>>>>>>>>>> sounds reasonable to keep MV-specific data in properties. >>>>>>>>>>>>> >>>>>>>>>>>>> UUID (for views), on the other hand, is common to all views, >>>>>>>>>>>>> hence it made sense to add it as a top level field. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Walaa. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, May 14, 2024 at 1:01 PM John Zhuge <jzh...@apache.org> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Szheon, >>>>>>>>>>>>>> >>>>>>>>>>>>>> While I fully share your concern of abusing table properties, >>>>>>>>>>>>>> we took the approach of option 1 and run it in production for >>>>>>>>>>>>>> several years: >>>>>>>>>>>>>> >>>>>>>>>>>>>> - the feature was still evolving >>>>>>>>>>>>>> - quick and simple implementation >>>>>>>>>>>>>> - table properties are simple enough and not confusing >>>>>>>>>>>>>> - haven't seen an urgent need to convert the properties >>>>>>>>>>>>>> to metadata fields and add API >>>>>>>>>>>>>> - do not wish on-disk changes (requiring lengthy >>>>>>>>>>>>>> tedious migration) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> That said, I am open to codifying the mv metadata into api >>>>>>>>>>>>>> and spec, with the following considerations >>>>>>>>>>>>>> >>>>>>>>>>>>>> - mv metadata has reached maturity and consensus (could >>>>>>>>>>>>>> be just a core portion) >>>>>>>>>>>>>> - when mv metadata becomes too complex >>>>>>>>>>>>>> - wish to support use cases that are quicker to adopt API >>>>>>>>>>>>>> changes (than engines), e.g., using Iceberg library to >>>>>>>>>>>>>> manipulate MVs, or >>>>>>>>>>>>>> parsing metadata files directly >>>>>>>>>>>>>> - Spark view catalog API can evolve separately from >>>>>>>>>>>>>> Iceberg API and spec changes >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks all for the great discussion! >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 10, 2024 at 10:48 PM Walaa Eldin Moustafa < >>>>>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Szheon, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for the follow-up. It is possible some of the >>>>>>>>>>>>>>> concerns were referring to the backend catalogs, but it is all >>>>>>>>>>>>>>> connected. >>>>>>>>>>>>>>> My main personal concern is from the engine connector APIs >>>>>>>>>>>>>>> point of >>>>>>>>>>>>>>> view, but I share the concern about the catalogs. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think everyone's concern is not about the complexity* per* >>>>>>>>>>>>>>> backend catalog/engine catalog API (in which case adding new >>>>>>>>>>>>>>> metadata to >>>>>>>>>>>>>>> existing objects could require less code), but rather about the >>>>>>>>>>>>>>> *number* of APIs and implementations that need to change >>>>>>>>>>>>>>> (in which case both new metadata to existing objects and new >>>>>>>>>>>>>>> objects >>>>>>>>>>>>>>> altogether introduce equal complexity). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Walaa. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, May 10, 2024 at 10:31 AM Szehon Ho < >>>>>>>>>>>>>>> szehon.apa...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Walaa >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> OK thanks for confirming. I am still not 100% in >>>>>>>>>>>>>>>> agreement, my understanding of the rationale for separate >>>>>>>>>>>>>>>> Table/View >>>>>>>>>>>>>>>> objects in the comment that you linked: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think the biggest problem with this is that we would need >>>>>>>>>>>>>>>>> to modify every catalog to support this combination and that >>>>>>>>>>>>>>>>> would be >>>>>>>>>>>>>>>>> really difficult. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> is about JavaCatalogs /REST Catalog needing to to support >>>>>>>>>>>>>>>> creating , persisting, and loading a MaterializedView object, >>>>>>>>>>>>>>>> which is much >>>>>>>>>>>>>>>> more complex. See HiveView PR for example : >>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/9852 We would >>>>>>>>>>>>>>>> have to do the same exercise for persisting MV. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In our case though, there's not much complexity regardless >>>>>>>>>>>>>>>> of approach ('properties' or new metadata fields), in terms of >>>>>>>>>>>>>>>> Java >>>>>>>>>>>>>>>> Catalog/REST Catalog. It's mostly pass-through to storage. >>>>>>>>>>>>>>>> Looks like you >>>>>>>>>>>>>>>> are referring to Spark's View model in terms of complexity, >>>>>>>>>>>>>>>> which may be a >>>>>>>>>>>>>>>> different story, but not sure if it is a good rationale to >>>>>>>>>>>>>>>> make Iceberg to >>>>>>>>>>>>>>>> use 'properties' . >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 'properties' is for read/write configurations, not to save >>>>>>>>>>>>>>>> metadatas. To me, its also brittle to save important >>>>>>>>>>>>>>>> metadata, as it's not >>>>>>>>>>>>>>>> in the defined schema. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> A string to string map of table properties. This is used to >>>>>>>>>>>>>>>>> control settings that affect reading and writing and is not >>>>>>>>>>>>>>>>> intended to be >>>>>>>>>>>>>>>>> used for arbitrary metadata. For example, >>>>>>>>>>>>>>>>> commit.retry.num-retries is used to control the number of >>>>>>>>>>>>>>>>> commit retries. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On the other hand, the Draft Spec suggests to save >>>>>>>>>>>>>>>> `lineage` as a modeled field on the Storage Table's snapshot >>>>>>>>>>>>>>>> metadata. >>>>>>>>>>>>>>>> This allows you to 'time travel', 'branch', and have this >>>>>>>>>>>>>>>> metadata life >>>>>>>>>>>>>>>> cycle integrated via normal snapshots lifecycle operations. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So that's my rationale. Not sure if we can come to an >>>>>>>>>>>>>>>> agreement over email though, and may need others to chime in >>>>>>>>>>>>>>>> as well. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>> Szehon >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, May 9, 2024 at 11:58 PM Walaa Eldin Moustafa < >>>>>>>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Szehon, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yes, you are reading the PR correctly, and interpreting >>>>>>>>>>>>>>>>> the meaning of properties correctly. I think the reply you >>>>>>>>>>>>>>>>> pasted from Ryan >>>>>>>>>>>>>>>>> refers to the same concept as well. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For the initial Google doc and the issue (by the way it is >>>>>>>>>>>>>>>>> an issue, not a PR), yes both are proposing new metadata >>>>>>>>>>>>>>>>> fields. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The references I made to the modeling doc [1, 2] are >>>>>>>>>>>>>>>>> reasons why new APIs are not desired. The cons/concerns >>>>>>>>>>>>>>>>> applicable to new >>>>>>>>>>>>>>>>> MV metadata apply by extension to new table and view metadata >>>>>>>>>>>>>>>>> fields. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The reason why new metadata adds complexity is that this >>>>>>>>>>>>>>>>> new metadata needs to be propagated to the engine API. For >>>>>>>>>>>>>>>>> example, here is >>>>>>>>>>>>>>>>> the ViewInfo [3] class in the Spark catalog, which is used in >>>>>>>>>>>>>>>>> view methods >>>>>>>>>>>>>>>>> like createView. Its fields correspond with the Iceberg >>>>>>>>>>>>>>>>> metadata. Adding >>>>>>>>>>>>>>>>> new Iceberg fields should be accompanied with new fields in >>>>>>>>>>>>>>>>> the engine >>>>>>>>>>>>>>>>> catalog/connector APIs, which was a major reason for >>>>>>>>>>>>>>>>> rejecting the combined >>>>>>>>>>>>>>>>> MV object model as well. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1zg0wQ5bVKTckf7-K_cdwF4mlRi6sixLcyEh6jErpGYY/edit?pli=1&disco=AAABK7e3QB4 >>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1zg0wQ5bVKTckf7-K_cdwF4mlRi6sixLcyEh6jErpGYY/edit?pli=1&disco=AAABIonvCGE >>>>>>>>>>>>>>>>> [3] >>>>>>>>>>>>>>>>> https://github.com/apache/spark/blob/2df494fd4e4e64b9357307fb0c5e8fc1b7491ac3/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewInfo.java#L45 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Walaa. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, May 9, 2024 at 11:30 PM Szehon Ho < >>>>>>>>>>>>>>>>> szehon.apa...@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Walaa >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> As there may be confusion in the word 'properties', I >>>>>>>>>>>>>>>>>> want to double check if we are talking about the same thing >>>>>>>>>>>>>>>>>> here. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I am reading your PR as adding lineage metadata as new >>>>>>>>>>>>>>>>>> key/value pair under the storage Table's 'properties' field: >>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/blob/main/format/spec.md?plain=1#L677 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> *optional* *optional* *properties* A string to string >>>>>>>>>>>>>>>>>> map of table properties. This is used to control settings >>>>>>>>>>>>>>>>>> that affect >>>>>>>>>>>>>>>>>> reading and writing and is not intended to be used for >>>>>>>>>>>>>>>>>> arbitrary metadata. >>>>>>>>>>>>>>>>>> For example, commit.retry.num-retries is used to control >>>>>>>>>>>>>>>>>> the number of commit retries. >>>>>>>>>>>>>>>>>> and adding Storage Table pointer as key/value pair in the >>>>>>>>>>>>>>>>>> View's 'properties' field: >>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/blob/main/format/view-spec.md?plain=1#L65 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> *optional* properties A string to string map of view >>>>>>>>>>>>>>>>>> properties [2] >>>>>>>>>>>>>>>>>> Is that correct? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On the other hand, I was talking about adding this >>>>>>>>>>>>>>>>>> metadata as actual fields, as is described in the Draft Spec >>>>>>>>>>>>>>>>>> of the Design >>>>>>>>>>>>>>>>>> Doc >>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A >>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>> first PR https://github.com/apache/iceberg/issues/6420 . >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Do you mean, the vote means we cannot model new fields >>>>>>>>>>>>>>>>>> like 'materialization' and 'lineage' as was proposed there ? >>>>>>>>>>>>>>>>>> If that is >>>>>>>>>>>>>>>>>> the interpretation, I am not sure I agree. I dont fully see >>>>>>>>>>>>>>>>>> how new fields >>>>>>>>>>>>>>>>>> adds more catalog implementation complexity over new >>>>>>>>>>>>>>>>>> key/value properties? >>>>>>>>>>>>>>>>>> To me, the vote seemed to just rule out using a combined >>>>>>>>>>>>>>>>>> catalog object >>>>>>>>>>>>>>>>>> (MaterializedView) in favor of re-using the Table and View >>>>>>>>>>>>>>>>>> metadata models, >>>>>>>>>>>>>>>>>> not to prevent change to the Table and View model. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>> Szehon >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, May 9, 2024 at 10:05 PM Walaa Eldin Moustafa < >>>>>>>>>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi Szehon, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I think choosing separate view + table objects precludes >>>>>>>>>>>>>>>>>>> us from adding new metadata to table and view metadata. >>>>>>>>>>>>>>>>>>> Here is one >>>>>>>>>>>>>>>>>>> relevant comment [1] from Ryan on the modeling doc, where >>>>>>>>>>>>>>>>>>> his point is that >>>>>>>>>>>>>>>>>>> we want to avoid introducing new APIs since it requires >>>>>>>>>>>>>>>>>>> updating every >>>>>>>>>>>>>>>>>>> catalog, and (quoting) even now, we have few >>>>>>>>>>>>>>>>>>> implementations that support >>>>>>>>>>>>>>>>>>> views because of the problems updating back ends. >>>>>>>>>>>>>>>>>>> Therefore, one of the >>>>>>>>>>>>>>>>>>> major reasons to avoid a new model with new metadata is to >>>>>>>>>>>>>>>>>>> avoid adding new >>>>>>>>>>>>>>>>>>> metadata, which introduces this complexity. Here is another >>>>>>>>>>>>>>>>>>> similar comment >>>>>>>>>>>>>>>>>>> from Renjie [2] on the cons listed for the combined object >>>>>>>>>>>>>>>>>>> approach. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Even Ryan's point on the MV issue that you referenced >>>>>>>>>>>>>>>>>>> reads to me as he is supportive of the property model. Here >>>>>>>>>>>>>>>>>>> are some quotes: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> > We would still want some MV metadata in table >>>>>>>>>>>>>>>>>>> *properties*. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> > I recommend instead reusing the existing snapshot >>>>>>>>>>>>>>>>>>> metadata structure to store what you need as snapshot >>>>>>>>>>>>>>>>>>> *properties*. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> > First, I think we want to avoid keeping much state >>>>>>>>>>>>>>>>>>> information in complex table *properties*. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Again, here, he is supportive of table properties, but >>>>>>>>>>>>>>>>>>> wants to make sure that the information is simple. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> > We may want additional metadata as well, like a UUID >>>>>>>>>>>>>>>>>>> to ensure we have the right view. I don't think we have a >>>>>>>>>>>>>>>>>>> UUID in the view >>>>>>>>>>>>>>>>>>> spec yet, but we could add one. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Here, he is very specific when it comes to new metadata >>>>>>>>>>>>>>>>>>> fields, and explicitly calls it out. That is the only new >>>>>>>>>>>>>>>>>>> metadata field in >>>>>>>>>>>>>>>>>>> that reply and by now it is already supported. It is also >>>>>>>>>>>>>>>>>>> not MV-specific. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hope this addresses your question on the property vs new >>>>>>>>>>>>>>>>>>> metadata model. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1zg0wQ5bVKTckf7-K_cdwF4mlRi6sixLcyEh6jErpGYY/edit?pli=1&disco=AAABK7e3QB4 >>>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1zg0wQ5bVKTckf7-K_cdwF4mlRi6sixLcyEh6jErpGYY/edit?pli=1&disco=AAABIonvCGE >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> Walaa. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, May 9, 2024 at 5:49 PM Szehon Ho < >>>>>>>>>>>>>>>>>>> szehon.apa...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi Walaa, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I agree, I definitely do not want yet another pr/doc >>>>>>>>>>>>>>>>>>>> where discussion happens. as its already quite spread out >>>>>>>>>>>>>>>>>>>> :) But did not >>>>>>>>>>>>>>>>>>>> want to clarify some points before we get started on the >>>>>>>>>>>>>>>>>>>> discussion on your >>>>>>>>>>>>>>>>>>>> PR. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> With reusing the table and view objects, we are not >>>>>>>>>>>>>>>>>>>>> changing the existing metadata of either table or view >>>>>>>>>>>>>>>>>>>>> spec but rather >>>>>>>>>>>>>>>>>>>>> introduce new properties and formalize them to express >>>>>>>>>>>>>>>>>>>>> materialized views >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On this point, I am not 100% sure that choosing to >>>>>>>>>>>>>>>>>>>> represent a MaterializedView as a separate View + Table >>>>>>>>>>>>>>>>>>>> object precludes us >>>>>>>>>>>>>>>>>>>> from adding to metadata of Table or View as the Draft Spec >>>>>>>>>>>>>>>>>>>> suggested, >>>>>>>>>>>>>>>>>>>> though. I think this point was discussed in Jan's initial >>>>>>>>>>>>>>>>>>>> PR with a good >>>>>>>>>>>>>>>>>>>> point from Ryan: >>>>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/6420#issuecomment-1369280546 >>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>> using Table Properties to track lineage is fairly brittle, >>>>>>>>>>>>>>>>>>>> and having it >>>>>>>>>>>>>>>>>>>> formalized in the Iceberg metadata is cleaner, and that >>>>>>>>>>>>>>>>>>>> was thus the >>>>>>>>>>>>>>>>>>>> direction of the Draft Spec in the design doc. What do >>>>>>>>>>>>>>>>>>>> people think? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>>>> Szehon >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, May 9, 2024 at 5:35 PM Walaa Eldin Moustafa < >>>>>>>>>>>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks Szehon. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The reason for the difference is that the proposal in >>>>>>>>>>>>>>>>>>>>> the Google doc is based on a new MV model, hence, new >>>>>>>>>>>>>>>>>>>>> metadata fields and a >>>>>>>>>>>>>>>>>>>>> new metadata model were being introduced (with types, >>>>>>>>>>>>>>>>>>>>> optionality, etc). >>>>>>>>>>>>>>>>>>>>> With reusing the table and view objects, we are not >>>>>>>>>>>>>>>>>>>>> changing the existing >>>>>>>>>>>>>>>>>>>>> metadata of either table or view spec but rather >>>>>>>>>>>>>>>>>>>>> introduce new properties >>>>>>>>>>>>>>>>>>>>> and formalize them to express materialized views. This >>>>>>>>>>>>>>>>>>>>> would be the answer >>>>>>>>>>>>>>>>>>>>> to most of the questions you posted on the PR (besides >>>>>>>>>>>>>>>>>>>>> some naming >>>>>>>>>>>>>>>>>>>>> questions, which I think should be straightforward). >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> With that fundamental difference, we cannot lift and >>>>>>>>>>>>>>>>>>>>> shift what is in the doc to any PR. Further, having >>>>>>>>>>>>>>>>>>>>> consensus on separate >>>>>>>>>>>>>>>>>>>>> table and view objects contradicts with the point being >>>>>>>>>>>>>>>>>>>>> made on having >>>>>>>>>>>>>>>>>>>>> consensus on the doc. We might have had agreements on >>>>>>>>>>>>>>>>>>>>> some elements, but >>>>>>>>>>>>>>>>>>>>> definitely not on the whole doc, proven by the follow ups >>>>>>>>>>>>>>>>>>>>> (also as a >>>>>>>>>>>>>>>>>>>>> community, not individuals). >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Therefore: we need a new space to discuss the separate >>>>>>>>>>>>>>>>>>>>> table and view properties. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Is the question whether to: >>>>>>>>>>>>>>>>>>>>> 1- Create a new doc >>>>>>>>>>>>>>>>>>>>> 2- Create a new PR? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I feel a PR is the most effective way, especially >>>>>>>>>>>>>>>>>>>>> given the fact that we discussed the topic a lot by now. >>>>>>>>>>>>>>>>>>>>> If we agree, we >>>>>>>>>>>>>>>>>>>>> can continue the discussion on the PR, else, we can >>>>>>>>>>>>>>>>>>>>> create a doc. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> Walaa. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, May 9, 2024 at 4:39 PM Szehon Ho < >>>>>>>>>>>>>>>>>>>>> szehon.apa...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks Walaa for driving it forward, looking forward >>>>>>>>>>>>>>>>>>>>>> to thinking about implementation of Materialized Views. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I see Jan's point, the PR spec change is similar but >>>>>>>>>>>>>>>>>>>>>> does not seem to be completely aligned with the Draft >>>>>>>>>>>>>>>>>>>>>> Spec in the design >>>>>>>>>>>>>>>>>>>>>> doc: >>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/ >>>>>>>>>>>>>>>>>>>>>> . I left my comments on PR of those sections with the >>>>>>>>>>>>>>>>>>>>>> links to the >>>>>>>>>>>>>>>>>>>>>> difference. I think most of those Draft Spec proposal >>>>>>>>>>>>>>>>>>>>>> is still applicable >>>>>>>>>>>>>>>>>>>>>> after the decision to have separate Table and View >>>>>>>>>>>>>>>>>>>>>> objects It will be >>>>>>>>>>>>>>>>>>>>>> interesting to at least see drill a bit further why we >>>>>>>>>>>>>>>>>>>>>> did not choose the >>>>>>>>>>>>>>>>>>>>>> approach in the Draft Spec and chose another way. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>>>>>> Szehon >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Wed, May 8, 2024 at 4:48 AM Jan Kaul >>>>>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> >>>>>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Well, everybody that actively contributed to the >>>>>>>>>>>>>>>>>>>>>>> discussion on the original google doc was in consensus. >>>>>>>>>>>>>>>>>>>>>>> That's why I >>>>>>>>>>>>>>>>>>>>>>> brought up the topic at the Community Sync on the >>>>>>>>>>>>>>>>>>>>>>> 2024-02-14 ( >>>>>>>>>>>>>>>>>>>>>>> https://youtu.be/uAQVGd5zV4I?t=890) to raise the >>>>>>>>>>>>>>>>>>>>>>> awareness of the broader community. After which the >>>>>>>>>>>>>>>>>>>>>>> discussion about the >>>>>>>>>>>>>>>>>>>>>>> storage model started. I don't think that the >>>>>>>>>>>>>>>>>>>>>>> discussion about a single >>>>>>>>>>>>>>>>>>>>>>> aspect of a proposal should invalidate all other >>>>>>>>>>>>>>>>>>>>>>> aspects of the proposal. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Regardless, the state of the proposal from the >>>>>>>>>>>>>>>>>>>>>>> original google doc contains a lot of valuable >>>>>>>>>>>>>>>>>>>>>>> contributions from Micah, >>>>>>>>>>>>>>>>>>>>>>> Szehon, Jack, Dan, yourself and others and it should at >>>>>>>>>>>>>>>>>>>>>>> least provide the >>>>>>>>>>>>>>>>>>>>>>> basis for any further discussion. I don't think it's >>>>>>>>>>>>>>>>>>>>>>> effective to start >>>>>>>>>>>>>>>>>>>>>>> with a completely different design because we are bound >>>>>>>>>>>>>>>>>>>>>>> to have the same >>>>>>>>>>>>>>>>>>>>>>> discussions all over again. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks, Jan >>>>>>>>>>>>>>>>>>>>>>> On 08.05.24 12:11, Walaa Eldin Moustafa wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> The only consensus the community had was on the >>>>>>>>>>>>>>>>>>>>>>> object model through the most recent voting thread [1]. >>>>>>>>>>>>>>>>>>>>>>> This kind of >>>>>>>>>>>>>>>>>>>>>>> consensus was not present during the doc discussions, >>>>>>>>>>>>>>>>>>>>>>> and this should be >>>>>>>>>>>>>>>>>>>>>>> evident from the fact the last doc state listed 5 >>>>>>>>>>>>>>>>>>>>>>> alternatives with no >>>>>>>>>>>>>>>>>>>>>>> particular conclusion. I am not quite sure what type of >>>>>>>>>>>>>>>>>>>>>>> consensus we are >>>>>>>>>>>>>>>>>>>>>>> referring to here given all the follow up discussions, >>>>>>>>>>>>>>>>>>>>>>> alternatives, etc. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Due to the separate object model, the PR is >>>>>>>>>>>>>>>>>>>>>>> fundamentally different from the doc in the sense it >>>>>>>>>>>>>>>>>>>>>>> does not propose a new >>>>>>>>>>>>>>>>>>>>>>> metadata model but rather formalizes some new table and >>>>>>>>>>>>>>>>>>>>>>> view properties >>>>>>>>>>>>>>>>>>>>>>> related to MVs. That is also one reason there are no >>>>>>>>>>>>>>>>>>>>>>> repeated discussions. >>>>>>>>>>>>>>>>>>>>>>> That said, if you feel there is a repeated discussion >>>>>>>>>>>>>>>>>>>>>>> (which I do not see >>>>>>>>>>>>>>>>>>>>>>> so far), it would be best to link the relevant >>>>>>>>>>>>>>>>>>>>>>> discussion from the doc in a >>>>>>>>>>>>>>>>>>>>>>> comment. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Happy to move the discussion elsewhere if there is >>>>>>>>>>>>>>>>>>>>>>> sufficient support for this idea, but as things stand, >>>>>>>>>>>>>>>>>>>>>>> I do not see this as >>>>>>>>>>>>>>>>>>>>>>> an efficient way to make progress. It sounds we have >>>>>>>>>>>>>>>>>>>>>>> been re-emphasizing >>>>>>>>>>>>>>>>>>>>>>> the same points in the last two replies, so I will let >>>>>>>>>>>>>>>>>>>>>>> others chime in at >>>>>>>>>>>>>>>>>>>>>>> this point. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>> https://lists.apache.org/thread/rotmqzmwk5jrcsyxhzjhrvcjs5v3yjcc >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> Walaa. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Wed, May 8, 2024 at 2:31 AM Jan Kaul >>>>>>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> >>>>>>>>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> The original google doc >>>>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?usp=sharing> >>>>>>>>>>>>>>>>>>>>>>>> discussed multiple >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>