Just to clarify: I think we have a consensus on the two possible
options. So the vote could be helpful to have a consensus about which
option.

Anyway, we still have discussions going on on this topic :)

Regards
JB

On Wed, Apr 3, 2024 at 10:02 PM Ryan Blue <b...@tabular.io> wrote:
>
> If there is consensus, great. We don't usually have a vote when there is 
> already consensus. That said, I haven't really seen a confirmation that we 
> have consensus, like a thread where people that originally had different 
> perspectives all said they favored the same option.
>
> It can help to build clarity by starting a new thread (this one is 70+ 
> messages) with a clear summary (_not_ a doc) of the direction and ask people 
> to speak up if they do or don't agree.
>
> Ryan
>
> On Wed, Apr 3, 2024 at 1:33 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
>>
>> I thought we have a consensus in the doc at least on the possible
>> option. I understood the vote was to adopt one of the options (that is
>> possible for a vote).
>>
>> If we still need more discussion on the possible options or having a
>> consensus on a specific option, it makes sense to continue the
>> discussion on the doc as soon as we are not "blocked" :)
>>
>> Regards
>> JB
>>
>> On Tue, Apr 2, 2024 at 9:12 PM Daniel Weeks <daniel.c.we...@gmail.com> wrote:
>> >
>> > I don't think we're in a position to open a vote (or maybe there's a 
>> > misunderstanding of what the vote is set out to achieve).
>> >
>> > We need to continue the discussion until there is a general consensus on 
>> > the direction we want to go (not on what options are available).
>> >
>> > The vote is a confirmation of the direction, not a way to settle 
>> > disagreements about approaches.
>> >
>> > I think we need to have a more focused discussion (this can either be at a 
>> > sync or we can schedule a time).
>> >
>> > -Dan
>> >
>> >
>> >
>> > On Mon, Apr 1, 2024 at 10:45 PM Jean-Baptiste Onofré <j...@nanthrax.net> 
>> > wrote:
>> >>
>> >> Hi Walaa
>> >>
>> >> Yes, I think it makes sense to go with a vote, now that pros/cons are
>> >> clearly state in the doc.
>> >>
>> >> Thanks !
>> >> Regards
>> >> JB
>> >>
>> >> On Tue, Apr 2, 2024 at 3:59 AM Walaa Eldin Moustafa
>> >> <wa.moust...@gmail.com> wrote:
>> >> >
>> >> > Hi all, there has not been new activity on the doc for some time. 
>> >> > Should we consider voting?
>> >> >
>> >> > On Thu, Mar 28, 2024 at 6:59 AM Jean-Baptiste Onofré 
>> >> > <j...@nanthrax.net> wrote:
>> >> >>
>> >> >> Yes, correct, thanks Manu for pointing it out.
>> >> >>
>> >> >> Thanks !
>> >> >> Regards
>> >> >> JB
>> >> >>
>> >> >> On Thu, Mar 28, 2024 at 9:55 AM Manu Zhang <owenzhang1...@gmail.com> 
>> >> >> wrote:
>> >> >> >
>> >> >> > I think Jan already created it
>> >> >> > https://github.com/apache/iceberg/issues/10043
>> >> >> >
>> >> >> > Jean-Baptiste Onofré <j...@nanthrax.net>于2024年3月28日 周四16:46写道:
>> >> >> >>
>> >> >> >> Hi Walaa,
>> >> >> >>
>> >> >> >> Yes, I think it would be great to create the GH Issue with the
>> >> >> >> proposal template, it would allow us to track the proposal and link
>> >> >> >> the doc (the comments should go in the doc directly).
>> >> >> >> Please, let me know if I can help on that.
>> >> >> >>
>> >> >> >> I'm working on a PR to list the proposals on the website and the
>> >> >> >> "stale reminder".
>> >> >> >>
>> >> >> >> Thanks !
>> >> >> >> Regards
>> >> >> >> JB
>> >> >> >>
>> >> >> >> On Thu, Mar 28, 2024 at 6:52 AM Walaa Eldin Moustafa
>> >> >> >> <wa.moust...@gmail.com> wrote:
>> >> >> >> >
>> >> >> >> > Do we need to create a proposal issue specifically to track this 
>> >> >> >> > doc?
>> >> >> >> >
>> >> >> >> > Also, everyone, since there has been some updates, would be good 
>> >> >> >> > to chime in again to discuss the updates. (doc link here for 
>> >> >> >> > convenience).
>> >> >> >> >
>> >> >> >> > Thanks,
>> >> >> >> > Walaa.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Tue, Mar 26, 2024 at 11:37 PM Jean-Baptiste Onofré 
>> >> >> >> > <j...@nanthrax.net> wrote:
>> >> >> >> >>
>> >> >> >> >> It sounds good. I would also propose to use the "proposal 
>> >> >> >> >> process":
>> >> >> >> >> creating a github issue with the "proposal" tag and link the 
>> >> >> >> >> document
>> >> >> >> >> there in a comment.
>> >> >> >> >>
>> >> >> >> >> Regards
>> >> >> >> >> JB
>> >> >> >> >>
>> >> >> >> >> On Tue, Mar 26, 2024 at 3:05 PM Walaa Eldin Moustafa
>> >> >> >> >> <wa.moust...@gmail.com> wrote:
>> >> >> >> >> >
>> >> >> >> >> > Thanks Jan! To avoid spreading discussions on multiple places, 
>> >> >> >> >> > I will continue the comments on the doc. Also it is easier to 
>> >> >> >> >> > run into communication gaps in email threads since effectively 
>> >> >> >> >> > we have one thread, but in docs we have many.
>> >> >> >> >> >
>> >> >> >> >> > Thanks,
>> >> >> >> >> > Walaa.
>> >> >> >> >> >
>> >> >> >> >> > On Tue, Mar 26, 2024 at 6:27 AM Jan Kaul 
>> >> >> >> >> > <jank...@mailbox.org.invalid> wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> I've added a description to the "Combined metadata" Option of 
>> >> >> >> >> >> Walaa's document. I'm also adding it here:
>> >> >> >> >> >>
>> >> >> >> >> >> This option treats the underlying view and storage table as a 
>> >> >> >> >> >> combined catalog object. The operation of this combined 
>> >> >> >> >> >> approach can be best demonstrated by looking at the different 
>> >> >> >> >> >> layers of the Iceberg implementation. In the top layer is the 
>> >> >> >> >> >> Iceberg library that interacts with a particular Iceberg 
>> >> >> >> >> >> catalog. The catalog handles the access to the metadata 
>> >> >> >> >> >> storage.
>> >> >> >> >> >> This option uses a combined storage object to store view and 
>> >> >> >> >> >> table metadata related to the materialized view. To avoid the 
>> >> >> >> >> >> definition of an entirely new metadata format, the storage 
>> >> >> >> >> >> object is composed of the view and table metadata. 
>> >> >> >> >> >> Additionally the combined storage object has a single 
>> >> >> >> >> >> identifier in the catalogs. The Iceberg library treats the 
>> >> >> >> >> >> materialized view as a separate view and a storage table 
>> >> >> >> >> >> object, it is only at the catalog and storage layer that the 
>> >> >> >> >> >> materialized view is treated as a single entity.
>> >> >> >> >> >> To reuse most of the existing TableCatalog, ViewCatalog and 
>> >> >> >> >> >> their operations, the table and view catalog can be thought 
>> >> >> >> >> >> of as “filters” (lenses), that allow the interaction only 
>> >> >> >> >> >> with the corresponding part of the MV storage object. 
>> >> >> >> >> >> Performing a “CommitView” operation on the view catalog will 
>> >> >> >> >> >> only affect the view metadata part of the combined MV storage 
>> >> >> >> >> >> object. And similarly, performing a “CommitTable” operation 
>> >> >> >> >> >> on the table catalog will only affect the table metadata part 
>> >> >> >> >> >> of the combined MV storage object. Both catalogs use the same 
>> >> >> >> >> >> identifier for operations on the materialized view.
>> >> >> >> >> >> The creation of a materialized view is done with the 
>> >> >> >> >> >> “createView” operation (with additional materialization flag) 
>> >> >> >> >> >> on the view catalog, creating a combined MV storage object 
>> >> >> >> >> >> with an empty storage table.
>> >> >> >> >> >> One could entirely reuse the existing API for loading the 
>> >> >> >> >> >> materialized view metadata as follows. When calling the 
>> >> >> >> >> >> “loadView” method of the ViewCatalog, the catalog 
>> >> >> >> >> >> implementation fetches and caches the entire MV metadata 
>> >> >> >> >> >> object in process and returns the view metadata part. When 
>> >> >> >> >> >> the “loadTable” method of the TableCatalog is then called to 
>> >> >> >> >> >> obtain the storage table, it returns the table part of the 
>> >> >> >> >> >> cached MV metadata object.
>> >> >> >> >> >>
>> >> >> >> >> >> Best wishes,
>> >> >> >> >> >>
>> >> >> >> >> >> Jan
>> >> >> >> >> >>
>> >> >> >> >> >> On 3/26/24 9:08 AM, Jan Kaul wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> I think it makes sense if I use the "Description" section of 
>> >> >> >> >> >> your document to clarify how I imagine a combined MV solution 
>> >> >> >> >> >> to look like. This would simplify the discussion about pros 
>> >> >> >> >> >> and cons, because we can reference or extend the description. 
>> >> >> >> >> >> I will try to find the time later today.
>> >> >> >> >> >>
>> >> >> >> >> >> Thanks,
>> >> >> >> >> >>
>> >> >> >> >> >> Jan
>> >> >> >> >> >>
>> >> >> >> >> >> On 3/25/24 4:39 PM, Walaa Eldin Moustafa wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> Thanks Jan! I am not sure if you would like to make 
>> >> >> >> >> >> suggestions to revise the options themselves or the current 
>> >> >> >> >> >> options pros and cons. In either case, as mentioned earlier, 
>> >> >> >> >> >> we can do that on the doc and once we agree on the options 
>> >> >> >> >> >> and their pros and cons we can move forward. How does that 
>> >> >> >> >> >> sound?
>> >> >> >> >> >>
>> >> >> >> >> >> Thanks,
>> >> >> >> >> >> Walaa.
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> On Mon, Mar 25, 2024 at 7:45 AM Jan Kaul 
>> >> >> >> >> >> <jank...@mailbox.org.invalid> wrote:
>> >> >> >> >> >>>
>> >> >> >> >> >>> I have the feeling that the current pros and cons from the 
>> >> >> >> >> >>> summary target a version of the MV spec that wasn't really 
>> >> >> >> >> >>> part of the discussion. The current arguments target a 
>> >> >> >> >> >>> completely new specification for materialized views which we 
>> >> >> >> >> >>> agreed on, is out of scope. Instead of a completely new 
>> >> >> >> >> >>> specification the argument was made for a MV metadata object 
>> >> >> >> >> >>> that embeds the View and the Table metadata, which was 
>> >> >> >> >> >>> Option 6 in Jack's summary document. With that approach the 
>> >> >> >> >> >>> "commitView" and "commitTable" operations don't have to be 
>> >> >> >> >> >>> changed and only the "loadView" operation has to be adopted. 
>> >> >> >> >> >>> Additionally, compaction and snapshot expiration can be 
>> >> >> >> >> >>> reused for the embedded solution. With that in mind, the 
>> >> >> >> >> >>> cons 2, 4, 5, 6 from the summary don't really apply.
>> >> >> >> >> >>>
>> >> >> >> >> >>> Furthermore, I think we should distinguish between pros and 
>> >> >> >> >> >>> cons for the implementers and the users. Because most of the 
>> >> >> >> >> >>> pros (no new operations) for separate objects (option1) are 
>> >> >> >> >> >>> for the implementers and most of the pros (single logical 
>> >> >> >> >> >>> object, doesn't require 2 loads) for combined objects 
>> >> >> >> >> >>> (option3) are for the users. In my opinion, in the long run 
>> >> >> >> >> >>> the design decisions should be focused more on the user 
>> >> >> >> >> >>> preferences than the implementers.
>> >> >> >> >> >>> On 3/25/24 14:49, Benny Chow wrote:
>> >> >> >> >> >>>
>> >> >> >> >> >>> Hi Manu
>> >> >> >> >> >>>
>> >> >> >> >> >>> This is Walaa's Spark implementation for option 1:  
>> >> >> >> >> >>> https://github.com/apache/iceberg/pull/9830/files/a9e1bee3b5bf5914e5330d3b195042aea33868c9
>> >> >> >> >> >>> There's no code for option 2 yet.
>> >> >> >> >> >>>
>> >> >> >> >> >>> Best
>> >> >> >> >> >>> Benny
>> >> >> >> >> >>>
>> >> >> >> >> >>> On Mon, Mar 25, 2024 at 12:37 AM Manu Zhang 
>> >> >> >> >> >>> <owenzhang1...@gmail.com> wrote:
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> Thanks Walaa for the summary. It's unclear to me which are 
>> >> >> >> >> >>>> the reference implementation for option 1 and reference MV 
>> >> >> >> >> >>>> spec for option 2 from the context. I can find some links 
>> >> >> >> >> >>>> in the References section but not sure which should be 
>> >> >> >> >> >>>> referred to respectively.
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> On Mon, Mar 25, 2024 at 3:38 AM Walaa Eldin Moustafa 
>> >> >> >> >> >>>> <wa.moust...@gmail.com> wrote:
>> >> >> >> >> >>>>>
>> >> >> >> >> >>>>> Thanks Himadri for the questions. At this point, our 
>> >> >> >> >> >>>>> objective is to have a common understanding of both 
>> >> >> >> >> >>>>> options and their pros and cons. The best way to achieve 
>> >> >> >> >> >>>>> this is to iterate on the doc to discuss the details of 
>> >> >> >> >> >>>>> each option or their pros and cons. We can always add more 
>> >> >> >> >> >>>>> details or update the pros and cons. The main thing is to 
>> >> >> >> >> >>>>> keep the options to two so that we keep the scope 
>> >> >> >> >> >>>>> manageable.
>> >> >> >> >> >>>>>
>> >> >> >> >> >>>>> Once we have a common understanding, it will be easy to 
>> >> >> >> >> >>>>> make a choice and move forward. Therefore, I would suggest 
>> >> >> >> >> >>>>> reframing your questions as either adding suggestions to 
>> >> >> >> >> >>>>> add more details to the options, questions on how either 
>> >> >> >> >> >>>>> works, or discussions of their pros and cons on the doc.
>> >> >> >> >> >>>>>
>> >> >> >> >> >>>>> Thanks,
>> >> >> >> >> >>>>> Walaa.
>> >> >> >> >> >>>>>
>
>
>
> --
> Ryan Blue
> Tabular

Reply via email to