Hi all, This is a reminder that we are going to have our second discussion meeting tomorrow at 10-11pm PST. Please find the link below, everyone is welcome to join!
Join Zoom Meeting https://uci.zoom.us/j/91986206610 <https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw24sxPtI6hbukCSo3nlQsbn> Meeting ID: 919 8620 6610 One tap mobile +16699006833,,91986206610# US (San Jose) +12532158782,,91986206610# US (Tacoma) Dial by your location +1 669 900 6833 US (San Jose) +1 253 215 8782 US (Tacoma) +1 346 248 7799 US (Houston) +1 301 715 8592 US (Washington DC) +1 312 626 6799 US (Chicago) +1 646 558 8656 US (New York) Meeting ID: 919 8620 6610 Find your local number: https://uci.zoom.us/u/acyXcc43Cd <https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FacyXcc43Cd&sa=D&source=calendar&usd=2&usg=AOvVaw2W08kj_8hEx44dryeZlXb6> Join by Skype for Business https://uci.zoom.us/skype/91986206610 <https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw3w0M0YYbcjPyBXzNpyyk0Z> Thanks, Botong On Wed, May 5, 2021 at 9:55 AM Botong Huang <pku...@gmail.com> wrote: > Hi Stamatis and all, > > Thanks for the interest! Let's tentatively schedule the next meeting next > Wednesday at May 12, 10pm-11pm PST then. Please let us know if there's new > needs showing up. > > Best, > Botong > > On Sun, May 2, 2021 at 2:59 PM Stamatis Zampetakis <zabe...@gmail.com> > wrote: > >> Hello, >> >> I really regret missing the first meeting, sorry about that. I added my >> preferences in the document. >> I will make sure to attend the next one and help as much as I can. >> >> I didn't have the chance yet to go over the paper but will try to do it >> before the next meeting. >> >> For me the following dates are more convenient than others so it would be >> nice if we could arrange it then. >> >> Thu, May 6, 10pm PST >> Tue, May 12, 10pm PST >> >> Best, >> Stamatis >> >> On Sat, May 1, 2021 at 9:42 PM Julian Hyde <jh...@apache.org> wrote: >> >> > I have added my time preferences to the doc [1]. I am generally >> > available any evening Mon - Thu. How about we meet Monday 10th May? >> > >> > Stamatis, Jesus, Given the complexity of this work, I would very much >> > appreciate your insight, as experts in optimizer theory. Could one of >> > you join the next meeting? Of course we should choose a time that >> > works for everyone's schedule. >> > >> > Julian >> > >> > [1] >> > >> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing >> > >> > On Wed, Apr 28, 2021 at 9:32 AM Botong Huang <pku...@gmail.com> wrote: >> > > >> > > We didn't record it, we will try to record the following meetings. >> Please >> > > add your time preference in the docs, so that we can find a meeting >> time >> > > that works for more people. >> > > >> > > Thanks, >> > > Botong >> > > >> > > On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina <vil...@hazelcast.com> >> > wrote: >> > > >> > > > Is there a recording available? >> > > > Viliam >> > > > >> > > > On Wed, 28 Apr 2021 at 00:15, Botong Huang <pku...@gmail.com> >> wrote: >> > > > >> > > > > Hi all, >> > > > > >> > > > > The meeting yesterday was fun and productive. As discussed, this >> is >> > the >> > > > > call to schedule our second meeting. >> > > > > >> > > > > We encourage everyone to add their time preferences during 05/01 - >> > 05/15 >> > > > > here: >> > > > > >> > > > > >> > > > >> > >> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing >> > > > > >> > > > > Thanks, >> > > > > Botong >> > > > > >> > > > > On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <pku...@gmail.com> >> > wrote: >> > > > > >> > > > > > Hi all, >> > > > > > We've created a zoom meeting below for our meeting next Monday >> > > > > > (9pm-10:30pm PST on 04/26). >> > > > > > Talk to you all soon! >> > > > > > >> > > > > > Join Zoom Meeting >> > > > > > https://uci.zoom.us/j/91279732686 >> > > > > > < >> > > > > >> > > > >> > >> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE >> > > > > > >> > > > > > >> > > > > > Meeting ID: 912 7973 2686 >> > > > > > One tap mobile >> > > > > > +16699006833,,91279732686# US (San Jose) >> > > > > > +12532158782,,91279732686# US (Tacoma) >> > > > > > >> > > > > > Dial by your location >> > > > > > +1 669 900 6833 US (San Jose) >> > > > > > +1 253 215 8782 US (Tacoma) >> > > > > > +1 346 248 7799 US (Houston) >> > > > > > +1 301 715 8592 US (Washington DC) >> > > > > > +1 312 626 6799 US (Chicago) >> > > > > > +1 646 558 8656 US (New York) >> > > > > > Meeting ID: 912 7973 2686 >> > > > > > Find your local number: https://uci.zoom.us/u/aykHTkJBh >> > > > > > < >> > > > > >> > > > >> > >> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM >> > > > > > >> > > > > > >> > > > > > Join by Skype for Business >> > > > > > https://uci.zoom.us/skype/91279732686 >> > > > > > < >> > > > > >> > > > >> > >> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy >> > > > > > >> > > > > > >> > > > > > >> > > > > > Thanks, >> > > > > > Botong >> > > > > > >> > > > > > On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <pku...@gmail.com >> > >> > > > wrote: >> > > > > > >> > > > > >> Hi all, >> > > > > >> >> > > > > >> According to the preferences collected, we are tentatively >> > scheduling >> > > > > our >> > > > > >> meeting at 9pm-10:30pm PST on 04/26 Monday. >> > > > > >> >> > > > > >> We will give a presentation about Tempura, followed by a free >> > > > > discussion. >> > > > > >> >> > > > > >> Please let us know if there are new other requests. Few days >> > before >> > > > > >> the meeting, I will send out a zoom meeting link. >> > > > > >> >> > > > > >> Thanks, >> > > > > >> Botong >> > > > > >> >> > > > > >> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <pku...@gmail.com> >> > wrote: >> > > > > >> >> > > > > >>> Hi Julian and all, >> > > > > >>> >> > > > > >>> We've posted the Tempura code base below. Feel free to take a >> > quick >> > > > > peek >> > > > > >>> at the last five commits. >> > > > > >>> >> > > > > >> > >> https://github.com/alibaba/cost-based-incremental-optimizer/commits/main >> > > > > >>> >> > > > > >>> I've also opened a Jira (CALCITE-4568 >> > > > > >>> <https://issues.apache.org/jira/browse/CALCITE-4568>), which >> > will >> > > > > serve >> > > > > >>> as the umbrella Jira for the feature. >> > > > > >>> >> > > > > >>> In the meantime, we encourage everyone to enter the time >> > preferences >> > > > > for >> > > > > >>> our first meeting here: >> > > > > >>> >> > > > > >>> >> > > > > >> > > > >> > >> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing >> > > > > >>> >> > > > > >>> Thanks, >> > > > > >>> Botong >> > > > > >>> >> > > > > >>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde < >> > jhyde.apa...@gmail.com> >> > > > > >>> wrote: >> > > > > >>> >> > > > > >>>> I have added my time preferences to the doc. >> > > > > >>>> >> > > > > >>>> Before we meet, could you publish a PR for us to review? >> > > > > >>>> >> > > > > >>>> Initial discussions will need to be about architecture and >> > > > high-level >> > > > > >>>> design. So I would ask Calcite reviewers not to review the PR >> > > > > line-by-line >> > > > > >>>> (or to leave comments in GitHub) but try to understand the >> > design >> > > > > >>>> holistically, and prepare questions/comments before the >> meeting. >> > > > > >>>> >> > > > > >>>> Botong, Can you please create a Calcite JIRA case for this >> task? >> > > > JIRA >> > > > > >>>> how we track long-running tasks such as this. >> > > > > >>>> >> > > > > >>>> Julian >> > > > > >>>> >> > > > > >>>> >> > > > > >>>> > On Apr 3, 2021, at 5:15 PM, Botong Huang <pku...@gmail.com >> > >> > > > wrote: >> > > > > >>>> > >> > > > > >>>> > Hi all, >> > > > > >>>> > >> > > > > >>>> > Apology for the delay. It took us some time to clean up our >> > code >> > > > > base >> > > > > >>>> and >> > > > > >>>> > publicly release it (which will be out soon) for a quick >> peek. >> > > > > >>>> > >> > > > > >>>> > We are ready to present our work. Let's schedule a time >> for a >> > Zoom >> > > > > >>>> > meeting and discuss how to integrate Tempura into Calcite. >> > > > > >>>> > >> > > > > >>>> > Since some of our team members are in China, we prefer the >> > time >> > > > slot >> > > > > >>>> of >> > > > > >>>> > 7:00pm-11:30pm PST any day. I've added our time preference >> in >> > the >> > > > > >>>> shared >> > > > > >>>> > doc below. >> > > > > >>>> > >> > > > > >>>> >> > > > > >> > > > >> > >> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing >> > > > > >>>> > >> > > > > >>>> > We encourage everyone to add their time preferences (during >> > > > > >>>> 04/15-04/30) in >> > > > > >>>> > this doc. In a week or so, we will try to settle a time >> that >> > works >> > > > > for >> > > > > >>>> > most. >> > > > > >>>> > >> > > > > >>>> > Thanks, >> > > > > >>>> > Botong >> > > > > >>>> > >> > > > > >>>> > On Sat, Jan 30, 2021 at 9:19 PM Botong Huang < >> > pku...@gmail.com> >> > > > > >>>> wrote: >> > > > > >>>> > >> > > > > >>>> >> Hi Julian and Rui, >> > > > > >>>> >> >> > > > > >>>> >> Sounds good to us. Please give us some time to prepare >> some >> > > > slides >> > > > > >>>> for the >> > > > > >>>> >> meeting. >> > > > > >>>> >> >> > > > > >>>> >> I've created a doc below for discussion. Please feel free >> to >> > add >> > > > > >>>> more in >> > > > > >>>> >> here: >> > > > > >>>> >> >> > > > > >>>> >> >> > > > > >>>> >> > > > > >> > > > >> > >> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing >> > > > > >>>> >> >> > > > > >>>> >> Thanks, >> > > > > >>>> >> Botong >> > > > > >>>> >> >> > > > > >>>> >> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde < >> > > > > jhyde.apa...@gmail.com >> > > > > >>>> > >> > > > > >>>> >> wrote: >> > > > > >>>> >> >> > > > > >>>> >>> PS The “editable doc” that Rui refers to is also a good >> > idea. I >> > > > > >>>> think we >> > > > > >>>> >>> should create it to continue discussion after the first >> > meeting. >> > > > > >>>> >>> >> > > > > >>>> >>> Julian >> > > > > >>>> >>> >> > > > > >>>> >>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde < >> > > > > jhyde.apa...@gmail.com> >> > > > > >>>> >>> wrote: >> > > > > >>>> >>>> >> > > > > >>>> >>>> I think good next steps would be a PR and a meeting. >> The >> > PR >> > > > will >> > > > > >>>> allow >> > > > > >>>> >>> us to read the code, but I think we should do the first >> > round of >> > > > > >>>> questions >> > > > > >>>> >>> at the meeting. The meeting could perhaps start with a >> > > > > >>>> presentation of the >> > > > > >>>> >>> paper (do you have some slides you are planning to >> present >> > at >> > > > > VLDB, >> > > > > >>>> >>> Botong?) and then move on to questions about the >> concepts, >> > which >> > > > > >>>> >>> alternatives were considered, and how the concepts map >> onto >> > > > other >> > > > > >>>> current >> > > > > >>>> >>> and future concepts in calcite. >> > > > > >>>> >>>> >> > > > > >>>> >>>> I don’t think we should start “reviewing” the PR >> > line-by-line >> > > > at >> > > > > >>>> this >> > > > > >>>> >>> point. We need to understand the high-level concepts and >> > design >> > > > > >>>> choices. If >> > > > > >>>> >>> we start reviewing the PR we will get lost in the >> details. >> > > > > >>>> >>>> >> > > > > >>>> >>>> I know that integrating a major change is hard; I doubt >> > that we >> > > > > >>>> will be >> > > > > >>>> >>> able to integrate everything, but we can build >> understanding >> > > > about >> > > > > >>>> where >> > > > > >>>> >>> calcite needs to go, and I hope integrate a good amount >> of >> > code >> > > > to >> > > > > >>>> help us >> > > > > >>>> >>> get there. >> > > > > >>>> >>>> >> > > > > >>>> >>>> As I said before, after the integration I would like >> > people to >> > > > be >> > > > > >>>> able >> > > > > >>>> >>> to experiment with it and use it in their production >> > systems. >> > > > > That >> > > > > >>>> way, it >> > > > > >>>> >>> will not be an experiment that withers, but a feature set >> > > > > >>>> integrates with >> > > > > >>>> >>> other calcite features and gets stronger over time. >> > > > > >>>> >>>> >> > > > > >>>> >>>> Julian >> > > > > >>>> >>>> >> > > > > >>>> >>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang < >> > amaliu...@apache.org> >> > > > > >>>> wrote: >> > > > > >>>> >>>>> >> > > > > >>>> >>>>> For me to participate in the discussion for the above >> > > > > questions, >> > > > > >>>> I >> > > > > >>>> >>> will >> > > > > >>>> >>>>> need to read a lot more to know relevant context and >> > likely >> > > > ask >> > > > > >>>> lots of >> > > > > >>>> >>>>> questions :-). A editable doc is probably good for >> > questions >> > > > > and >> > > > > >>>> back >> > > > > >>>> >>> and >> > > > > >>>> >>>>> forward discussion. >> > > > > >>>> >>>>> >> > > > > >>>> >>>>> >> > > > > >>>> >>>>> -Rui >> > > > > >>>> >>>>> >> > > > > >>>> >>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang < >> > > > > amaliu...@apache.org >> > > > > >>>> > >> > > > > >>>> >>> wrote: >> > > > > >>>> >>>>>> >> > > > > >>>> >>>>>> I am also happy to help push this work into Calcite >> > (review >> > > > > code >> > > > > >>>> and >> > > > > >>>> >>> doc, >> > > > > >>>> >>>>>> etc.). >> > > > > >>>> >>>>>> >> > > > > >>>> >>>>>> While you can share your code so people can have more >> > idea >> > > > how >> > > > > >>>> it is >> > > > > >>>> >>>>>> implemented, I think it would be also nice to have a >> doc >> > to >> > > > > >>>> discuss >> > > > > >>>> >>> open >> > > > > >>>> >>>>>> questions above. Some points that I copy those to >> here: >> > > > > >>>> >>>>>> >> > > > > >>>> >>>>>> 1. Can this solution be compatible with existing >> > solutions in >> > > > > >>>> Calcite >> > > > > >>>> >>>>>> Streaming, materialized view maintenance, and >> multi-query >> > > > > >>>> optimization >> > > > > >>>> >>>>>> (Sigma and Delta relational operators, lattice, and >> Spool >> > > > > >>>> operator), >> > > > > >>>> >>>>>> 2. Did you find that you needed two separate cost >> models >> > - >> > > > one >> > > > > >>>> for >> > > > > >>>> >>> “view >> > > > > >>>> >>>>>> maintenance” and another for “user queries” - since >> the >> > > > > >>>> objectives of >> > > > > >>>> >>> each >> > > > > >>>> >>>>>> activity are so different? >> > > > > >>>> >>>>>> 3. whether this work will hasten the arrival of >> > > > multi-objective >> > > > > >>>> >>> parametric >> > > > > >>>> >>>>>> query optimization [1] in Calcite. >> > > > > >>>> >>>>>> 4. probably SQL shell support. >> > > > > >>>> >>>>>> >> > > > > >>>> >>>>>> >> > > > > >>>> >>>>>> [1]: >> > > > > >>>> >>>>>> >> > > > > >>>> >>> >> > > > > >>>> >> > > > > >> > > > >> > >> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext >> > > > > >>>> >>>>>> >> > > > > >>>> >>>>>> >> > > > > >>>> >>>>>> -Rui >> > > > > >>>> >>>>>> >> > > > > >>>> >>>>>> >> > > > > >>>> >>>>>> >> > > > > >>>> >>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert < >> > zinki...@gmail.com> >> > > > > >>>> wrote: >> > > > > >>>> >>>>>>> >> > > > > >>>> >>>>>>> it would be very nice to see a POC of your work. >> > > > > >>>> >>>>>>> >> > > > > >>>> >>>>>>> >> > > > > >>>> >>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang < >> > > > > >>>> pku...@gmail.com> >> > > > > >>>> >>> wrote: >> > > > > >>>> >>>>>>> >> > > > > >>>> >>>>>>>> Hi Julian, >> > > > > >>>> >>>>>>>> >> > > > > >>>> >>>>>>>> Just wondering if there are any updates? We are >> > wondering >> > > > if >> > > > > it >> > > > > >>>> >>> would >> > > > > >>>> >>>>>>> help >> > > > > >>>> >>>>>>>> to post our code for a quick preview. >> > > > > >>>> >>>>>>>> >> > > > > >>>> >>>>>>>> Thanks, >> > > > > >>>> >>>>>>>> Botong >> > > > > >>>> >>>>>>>> >> > > > > >>>> >>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang < >> > > > > pku...@gmail.com >> > > > > >>>> > >> > > > > >>>> >>> wrote: >> > > > > >>>> >>>>>>>> >> > > > > >>>> >>>>>>>>> Hi Julian, >> > > > > >>>> >>>>>>>>> >> > > > > >>>> >>>>>>>>> Thanks for your interest! Sure let's figure out a >> plan >> > > > that >> > > > > >>>> best >> > > > > >>>> >>>>>>> benefits >> > > > > >>>> >>>>>>>>> the community. Here are some clarifications that >> > hopefully >> > > > > >>>> answer >> > > > > >>>> >>> your >> > > > > >>>> >>>>>>>>> questions. >> > > > > >>>> >>>>>>>>> >> > > > > >>>> >>>>>>>>> In our work (Tempura), users specify the set of >> time >> > > > points >> > > > > to >> > > > > >>>> >>>>>>> consider >> > > > > >>>> >>>>>>>>> running and a cost function that expresses users' >> > > > preference >> > > > > >>>> over >> > > > > >>>> >>>>>>> time, >> > > > > >>>> >>>>>>>>> Tempura will generate the best incremental plan >> that >> > > > > >>>> minimizes the >> > > > > >>>> >>>>>>>> overall >> > > > > >>>> >>>>>>>>> cost function. >> > > > > >>>> >>>>>>>>> >> > > > > >>>> >>>>>>>>> In this incremental plan, the sub-plans at >> different >> > time >> > > > > >>>> points >> > > > > >>>> >>> can >> > > > > >>>> >>>>>>> be >> > > > > >>>> >>>>>>>>> different from each other, as opposed to identical >> > plans >> > > > in >> > > > > >>>> all >> > > > > >>>> >>> delta >> > > > > >>>> >>>>>>>> runs >> > > > > >>>> >>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of the >> > > > Tempura >> > > > > >>>> paper, >> > > > > >>>> >>> we >> > > > > >>>> >>>>>>> can >> > > > > >>>> >>>>>>>>> mimic the current streaming implementation by >> > specifying >> > > > two >> > > > > >>>> >>> (logical) >> > > > > >>>> >>>>>>>> time >> > > > > >>>> >>>>>>>>> points in Tempura, representing the initial run and >> > later >> > > > > >>>> delta >> > > > > >>>> >>> runs >> > > > > >>>> >>>>>>>>> respectively. In general, note that Tempura >> supports >> > > > various >> > > > > >>>> form >> > > > > >>>> >>> of >> > > > > >>>> >>>>>>>>> incremental computing, not only the small-delta >> > > > append-only >> > > > > >>>> data >> > > > > >>>> >>>>>>> model in >> > > > > >>>> >>>>>>>>> streaming systems. That's why we believe Tempura >> > subsumes >> > > > > the >> > > > > >>>> >>> current >> > > > > >>>> >>>>>>>>> streaming support, as well as any IVM >> implementations. >> > > > > >>>> >>>>>>>>> >> > > > > >>>> >>>>>>>>> About the cost model, we did not come up with a >> > seperate >> > > > > cost >> > > > > >>>> >>> model, >> > > > > >>>> >>>>>>> but >> > > > > >>>> >>>>>>>>> rather extended the existing one. Similar to >> > > > multi-objective >> > > > > >>>> >>>>>>>> optimization, >> > > > > >>>> >>>>>>>>> costs incurred at different time points are >> considered >> > > > > >>>> different >> > > > > >>>> >>>>>>>>> dimensions. Tempura lets users supply a function >> that >> > > > > >>>> converts this >> > > > > >>>> >>>>>>> cost >> > > > > >>>> >>>>>>>>> vector into a final cost. So under this function, >> any >> > two >> > > > > >>>> >>> incremental >> > > > > >>>> >>>>>>>> plans >> > > > > >>>> >>>>>>>>> are still comparable and there is an overall >> optimum. >> > I >> > > > > guess >> > > > > >>>> we >> > > > > >>>> >>> can >> > > > > >>>> >>>>>>> go >> > > > > >>>> >>>>>>>>> down the route of multi-objective parametric query >> > > > > >>>> optimization >> > > > > >>>> >>>>>>> instead >> > > > > >>>> >>>>>>>> if >> > > > > >>>> >>>>>>>>> there is a need. >> > > > > >>>> >>>>>>>>> >> > > > > >>>> >>>>>>>>> Next on materialized views and multi-query >> > optimization, >> > > > > >>>> since our >> > > > > >>>> >>>>>>>>> multi-time-point plan naturally involves >> materializing >> > > > > >>>> intermediate >> > > > > >>>> >>>>>>>> results >> > > > > >>>> >>>>>>>>> for later time points, we need to solve the >> problem of >> > > > > >>>> choosing >> > > > > >>>> >>>>>>>>> materializations and include the cost of saving and >> > > > reusing >> > > > > >>>> the >> > > > > >>>> >>>>>>>>> materializations when costing and comparing plans. >> We >> > > > > >>>> borrowed the >> > > > > >>>> >>>>>>>>> multi-query optimization techniques to solve this >> > problem >> > > > > even >> > > > > >>>> >>> though >> > > > > >>>> >>>>>>> we >> > > > > >>>> >>>>>>>>> are looking at a single query. As a result, we >> think >> > our >> > > > > work >> > > > > >>>> is >> > > > > >>>> >>>>>>>> orthogonal >> > > > > >>>> >>>>>>>>> to Calcite's facilities around utilizing existing >> > views, >> > > > > >>>> lattice >> > > > > >>>> >>> etc. >> > > > > >>>> >>>>>>> We >> > > > > >>>> >>>>>>>> do >> > > > > >>>> >>>>>>>>> feel that the multi-query optimization component >> can >> > be >> > > > > >>>> adopted to >> > > > > >>>> >>>>>>> wider >> > > > > >>>> >>>>>>>>> use, but probably need more suggestions from the >> > > > community. >> > > > > >>>> >>>>>>>>> >> > > > > >>>> >>>>>>>>> Lastly, our current implementation is set up in >> java >> > code, >> > > > > it >> > > > > >>>> >>> should >> > > > > >>>> >>>>>>> be >> > > > > >>>> >>>>>>>>> straightforward to hook it up with SQL shell. >> > > > > >>>> >>>>>>>>> >> > > > > >>>> >>>>>>>>> Thanks, >> > > > > >>>> >>>>>>>>> Botong >> > > > > >>>> >>>>>>>>> >> > > > > >>>> >>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde < >> > > > > >>>> >>> jhyde.apa...@gmail.com> >> > > > > >>>> >>>>>>>>> wrote: >> > > > > >>>> >>>>>>>>> >> > > > > >>>> >>>>>>>>>> Botong, >> > > > > >>>> >>>>>>>>>> >> > > > > >>>> >>>>>>>>>> This is very exciting; congratulations on this >> > research, >> > > > > and >> > > > > >>>> thank >> > > > > >>>> >>>>>>> you >> > > > > >>>> >>>>>>>>>> for contributing it back to Calcite. >> > > > > >>>> >>>>>>>>>> >> > > > > >>>> >>>>>>>>>> The research touches several areas in Calcite: >> > streaming, >> > > > > >>>> >>>>>>> materialized >> > > > > >>>> >>>>>>>>>> view maintenance, and multi-query optimization. >> As we >> > > > have >> > > > > >>>> already >> > > > > >>>> >>>>>>> some >> > > > > >>>> >>>>>>>>>> solutions in those areas (Sigma and Delta >> relational >> > > > > >>>> operators, >> > > > > >>>> >>>>>>> lattice, >> > > > > >>>> >>>>>>>>>> and Spool operator), it will be interesting to see >> > > > whether >> > > > > >>>> we can >> > > > > >>>> >>>>>>> make >> > > > > >>>> >>>>>>>> them >> > > > > >>>> >>>>>>>>>> compatible, or whether one concept can subsume >> > others. >> > > > > >>>> >>>>>>>>>> >> > > > > >>>> >>>>>>>>>> Your work differs from streaming queries in that >> your >> > > > > >>>> relations >> > > > > >>>> >>> are >> > > > > >>>> >>>>>>> used >> > > > > >>>> >>>>>>>>>> by “external” user queries, whereas in pure >> streaming >> > > > > >>>> queries, the >> > > > > >>>> >>>>>>> only >> > > > > >>>> >>>>>>>>>> activity is the change propagation. Did you find >> > that you >> > > > > >>>> needed >> > > > > >>>> >>> two >> > > > > >>>> >>>>>>>>>> separate cost models - one for “view maintenance” >> and >> > > > > >>>> another for >> > > > > >>>> >>>>>>> “user >> > > > > >>>> >>>>>>>>>> queries” - since the objectives of each activity >> are >> > so >> > > > > >>>> different? >> > > > > >>>> >>>>>>>>>> >> > > > > >>>> >>>>>>>>>> I wonder whether this work will hasten the >> arrival of >> > > > > >>>> >>> multi-objective >> > > > > >>>> >>>>>>>>>> parametric query optimization [1] in Calcite. >> > > > > >>>> >>>>>>>>>> >> > > > > >>>> >>>>>>>>>> I will make time over the next few days to read >> and >> > > > digest >> > > > > >>>> your >> > > > > >>>> >>>>>>> paper. >> > > > > >>>> >>>>>>>>>> Then I expect that we will have a back-and-forth >> > process >> > > > to >> > > > > >>>> create >> > > > > >>>> >>>>>>>>>> something that will be useful for the broader >> > community. >> > > > > >>>> >>>>>>>>>> >> > > > > >>>> >>>>>>>>>> One thing will be particularly useful: making this >> > > > > >>>> functionality >> > > > > >>>> >>>>>>>>>> available from a SQL shell, so that people can >> > experiment >> > > > > >>>> with >> > > > > >>>> >>> this >> > > > > >>>> >>>>>>>>>> functionality without writing Java code or >> setting up >> > > > > complex >> > > > > >>>> >>>>>>> databases >> > > > > >>>> >>>>>>>> and >> > > > > >>>> >>>>>>>>>> metadata. I have in mind something like the simple >> > DDL >> > > > > >>>> operations >> > > > > >>>> >>>>>>> that >> > > > > >>>> >>>>>>>> are >> > > > > >>>> >>>>>>>>>> available in Calcite’s ’server’ module. I wonder >> > whether >> > > > we >> > > > > >>>> could >> > > > > >>>> >>>>>>> devise >> > > > > >>>> >>>>>>>>>> some kind of SQL syntax for a “multi-query”. >> > > > > >>>> >>>>>>>>>> >> > > > > >>>> >>>>>>>>>> Julian >> > > > > >>>> >>>>>>>>>> >> > > > > >>>> >>>>>>>>>> [1] >> > > > > >>>> >>>>>>>>>> >> > > > > >>>> >>>>>>>> >> > > > > >>>> >>>>>>> >> > > > > >>>> >>> >> > > > > >>>> >> > > > > >> > > > >> > >> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext >> > > > > >>>> >>>>>>>>>> >> > > > > >>>> >>>>>>>>>> >> > > > > >>>> >>>>>>>>>> >> > > > > >>>> >>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang < >> > > > > pku...@gmail.com >> > > > > >>>> > >> > > > > >>>> >>>>>>> wrote: >> > > > > >>>> >>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>> Thanks Aron for pointing this out. To see the >> > figure, >> > > > > please >> > > > > >>>> >>> refer >> > > > > >>>> >>>>>>> to >> > > > > >>>> >>>>>>>>>> Fig >> > > > > >>>> >>>>>>>>>>> 3(a) in our paper: >> > > > > >>>> >>>>>>>>>> >> > https://kai-zeng.github.io/papers/tempura-vldb2021.pdf >> > > > > >>>> >>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>> Best, >> > > > > >>>> >>>>>>>>>>> Botong >> > > > > >>>> >>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao < >> > > > > >>>> taojia...@gmail.com> >> > > > > >>>> >>>>>>>> wrote: >> > > > > >>>> >>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>> Seems interesting, the pic can not be seen in >> the >> > mail, >> > > > > >>>> may you >> > > > > >>>> >>>>>>> open >> > > > > >>>> >>>>>>>> a >> > > > > >>>> >>>>>>>>>> JIRA >> > > > > >>>> >>>>>>>>>>>> for this, people who are interested in this can >> > > > subscribe >> > > > > >>>> to the >> > > > > >>>> >>>>>>>> JIRA? >> > > > > >>>> >>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>> Regards! >> > > > > >>>> >>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>> Aron Tao >> > > > > >>>> >>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>> Botong Huang <bot...@apache.org> 于2020年12月24日周四 >> > > > > 上午3:18写道: >> > > > > >>>> >>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>>> Hi all, >> > > > > >>>> >>>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>>> This is a proposal to extend the Calcite >> optimizer >> > > > into >> > > > > a >> > > > > >>>> >>> general >> > > > > >>>> >>>>>>>>>>>>> incremental query optimizer, based on our >> research >> > > > paper >> > > > > >>>> >>>>>>> published >> > > > > >>>> >>>>>>>> in >> > > > > >>>> >>>>>>>>>>>> VLDB >> > > > > >>>> >>>>>>>>>>>>> 2021: >> > > > > >>>> >>>>>>>>>>>>> Tempura: a general cost-based optimizer >> framework >> > for >> > > > > >>>> >>> incremental >> > > > > >>>> >>>>>>>> data >> > > > > >>>> >>>>>>>>>>>>> processing >> > > > > >>>> >>>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating >> > how >> > > > > >>>> Alibaba’s >> > > > > >>>> >>>>>>> data >> > > > > >>>> >>>>>>>>>>>>> warehouse is planning to use this incremental >> > query >> > > > > >>>> optimizer >> > > > > >>>> >>> to >> > > > > >>>> >>>>>>>>>>>> alleviate >> > > > > >>>> >>>>>>>>>>>>> cluster-wise resource skewness: >> > > > > >>>> >>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting >> > Resource-Aware >> > > > > >>>> >>> Incremental >> > > > > >>>> >>>>>>>>>>>> Computing >> > > > > >>>> >>>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>>> To our best knowledge, this is the first >> general >> > > > > >>>> cost-based >> > > > > >>>> >>>>>>>>>> incremental >> > > > > >>>> >>>>>>>>>>>>> optimizer that can find the best plan across >> > multiple >> > > > > >>>> families >> > > > > >>>> >>> of >> > > > > >>>> >>>>>>>>>>>>> incremental computing methods, including IVM, >> > > > Streaming, >> > > > > >>>> >>>>>>> DBToaster, >> > > > > >>>> >>>>>>>>>> etc. >> > > > > >>>> >>>>>>>>>>>>> Experiments (in the paper) shows that the >> > generated >> > > > best >> > > > > >>>> plan >> > > > > >>>> >>> is >> > > > > >>>> >>>>>>>>>>>>> consistently much better than the plans from >> each >> > > > > >>>> individual >> > > > > >>>> >>>>>>> method >> > > > > >>>> >>>>>>>>>>>> alone. >> > > > > >>>> >>>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>>> In general, incremental query planning is >> central >> > to >> > > > > >>>> database >> > > > > >>>> >>>>>>> view >> > > > > >>>> >>>>>>>>>>>>> maintenance and stream processing systems, and >> are >> > > > being >> > > > > >>>> >>> adopted >> > > > > >>>> >>>>>>> in >> > > > > >>>> >>>>>>>>>>>> active >> > > > > >>>> >>>>>>>>>>>>> databases, resumable query execution, >> approximate >> > > > query >> > > > > >>>> >>>>>>> processing, >> > > > > >>>> >>>>>>>>>> etc. >> > > > > >>>> >>>>>>>>>>>> We >> > > > > >>>> >>>>>>>>>>>>> are hoping that this feature can help widening >> the >> > > > > >>>> spectrum of >> > > > > >>>> >>>>>>>>>> Calcite, >> > > > > >>>> >>>>>>>>>>>>> solicit more use cases and adoption of Calcite. >> > > > > >>>> >>>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>>> Below is a brief description of the technical >> > details. >> > > > > >>>> Please >> > > > > >>>> >>>>>>> refer >> > > > > >>>> >>>>>>>> to >> > > > > >>>> >>>>>>>>>>>> the >> > > > > >>>> >>>>>>>>>>>>> Tempura paper for more details. We are also >> > working >> > > > on a >> > > > > >>>> >>> journal >> > > > > >>>> >>>>>>>>>> version >> > > > > >>>> >>>>>>>>>>>> of >> > > > > >>>> >>>>>>>>>>>>> the paper with more implementation details. >> > > > > >>>> >>>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>>> Currently the query plan generated by Calcite >> is >> > meant >> > > > > to >> > > > > >>>> be >> > > > > >>>> >>>>>>>> executed >> > > > > >>>> >>>>>>>>>>>>> altogether at once. In the proposal, Calcite’s >> > memo >> > > > will >> > > > > >>>> be >> > > > > >>>> >>>>>>> extended >> > > > > >>>> >>>>>>>>>> with >> > > > > >>>> >>>>>>>>>>>>> temporal information so that it is capable of >> > > > generating >> > > > > >>>> >>>>>>> incremental >> > > > > >>>> >>>>>>>>>>>> plans >> > > > > >>>> >>>>>>>>>>>>> that include multiple sub-plans to execute at >> > > > different >> > > > > >>>> time >> > > > > >>>> >>>>>>> points. >> > > > > >>>> >>>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>>> The main idea is to view each table as one that >> > > > changes >> > > > > >>>> over >> > > > > >>>> >>> time >> > > > > >>>> >>>>>>>>>> (Time >> > > > > >>>> >>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we >> > > > introduced >> > > > > >>>> >>>>>>> TvrMetaSet >> > > > > >>>> >>>>>>>>>> into >> > > > > >>>> >>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to >> > track >> > > > > >>>> related >> > > > > >>>> >>>>>>> RelSets >> > > > > >>>> >>>>>>>>>> of a >> > > > > >>>> >>>>>>>>>>>>> changing table (e.g. snapshot of the table at >> > certain >> > > > > >>>> time, >> > > > > >>>> >>>>>>> delta of >> > > > > >>>> >>>>>>>>>> the >> > > > > >>>> >>>>>>>>>>>>> table between two time points, etc.). >> > > > > >>>> >>>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>>> [image: image.png] >> > > > > >>>> >>>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>>> For example in the above figure, each vertical >> > line >> > > > is a >> > > > > >>>> >>>>>>> TvrMetaSet >> > > > > >>>> >>>>>>>>>>>>> representing a TVR (S, R, S left outer join R, >> > etc.). >> > > > > >>>> >>> Horizontal >> > > > > >>>> >>>>>>>> lines >> > > > > >>>> >>>>>>>>>>>>> represent time. Each black dot in the grid is a >> > > > RelSet. >> > > > > >>>> Users >> > > > > >>>> >>> can >> > > > > >>>> >>>>>>>>>> write >> > > > > >>>> >>>>>>>>>>>> TVR >> > > > > >>>> >>>>>>>>>>>>> Rewrite Rules to describe valid transformations >> > > > between >> > > > > >>>> these >> > > > > >>>> >>>>>>> dots. >> > > > > >>>> >>>>>>>>>> For >> > > > > >>>> >>>>>>>>>>>>> example, the blues lines are inter-TVR rules >> that >> > > > > >>>> describe how >> > > > > >>>> >>> to >> > > > > >>>> >>>>>>>>>> compute >> > > > > >>>> >>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other >> > TVRs. >> > > > The >> > > > > >>>> red >> > > > > >>>> >>> lines >> > > > > >>>> >>>>>>>> are >> > > > > >>>> >>>>>>>>>>>>> intra-TVR rules that describe transformations >> > within a >> > > > > >>>> TVR. All >> > > > > >>>> >>>>>>> TVR >> > > > > >>>> >>>>>>>>>>>> rewrite >> > > > > >>>> >>>>>>>>>>>>> rules are logical rules. All existing Calcite >> > rules >> > > > > still >> > > > > >>>> work >> > > > > >>>> >>> in >> > > > > >>>> >>>>>>>> the >> > > > > >>>> >>>>>>>>>> new >> > > > > >>>> >>>>>>>>>>>>> volcano system without modification. >> > > > > >>>> >>>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>>> All changes in this feature will consist of >> four >> > > > parts: >> > > > > >>>> >>>>>>>>>>>>> 1. Memo extension with TvrMetaSet >> > > > > >>>> >>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching >> > TvrMetaSet >> > > > > and >> > > > > >>>> >>>>>>> RelNodes, >> > > > > >>>> >>>>>>>>>> as >> > > > > >>>> >>>>>>>>>>>>> well as links in between the nodes. >> > > > > >>>> >>>>>>>>>>>>> 3. A basic set of TvrRules, written using the >> > upgraded >> > > > > >>>> rule >> > > > > >>>> >>>>>>> engine >> > > > > >>>> >>>>>>>>>> API. >> > > > > >>>> >>>>>>>>>>>>> 4. Multi-query optimization, used to find the >> best >> > > > > >>>> incremental >> > > > > >>>> >>>>>>> plan >> > > > > >>>> >>>>>>>>>>>>> involving multiple time points. >> > > > > >>>> >>>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>>> Note that this feature is an extension in >> nature >> > and >> > > > > thus >> > > > > >>>> when >> > > > > >>>> >>>>>>>>>> disabled, >> > > > > >>>> >>>>>>>>>>>>> does not change any existing Calcite behavior. >> > > > > >>>> >>>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>>> Other than scenarios in the paper, we also >> applied >> > > > this >> > > > > >>>> >>>>>>>>>> Calcite-extended >> > > > > >>>> >>>>>>>>>>>>> incremental query optimizer to a type of >> periodic >> > > > query >> > > > > >>>> called >> > > > > >>>> >>>>>>> the >> > > > > >>>> >>>>>>>>>>>> ‘‘range >> > > > > >>>> >>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It >> achieved >> > cost >> > > > > >>>> savings >> > > > > >>>> >>> of >> > > > > >>>> >>>>>>> 80% >> > > > > >>>> >>>>>>>>>> on >> > > > > >>>> >>>>>>>>>>>>> total CPU and memory consumption, and 60% on >> > > > end-to-end >> > > > > >>>> >>> execution >> > > > > >>>> >>>>>>>>>> time. >> > > > > >>>> >>>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>>> All comments and suggestions are welcome. >> Thanks >> > and >> > > > > happy >> > > > > >>>> >>>>>>> holidays! >> > > > > >>>> >>>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>>> Best, >> > > > > >>>> >>>>>>>>>>>>> Botong >> > > > > >>>> >>>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>>>> >> > > > > >>>> >>>>>>>>>> >> > > > > >>>> >>>>>>>>>> >> > > > > >>>> >>>>>>>> >> > > > > >>>> >>>>>>> >> > > > > >>>> >>>>>>> >> > > > > >>>> >>>>>>> -- >> > > > > >>>> >>>>>>> ~~~~~~~~~~~~~~~ >> > > > > >>>> >>>>>>> no mistakes >> > > > > >>>> >>>>>>> ~~~~~~~~~~~~~~~~~~ >> > > > > >>>> >>>>>>> >> > > > > >>>> >>>>>> >> > > > > >>>> >>> >> > > > > >>>> >> >> > > > > >>>> >> > > > > >>>> >> > > > > >> > > > >> > > > >> > > > -- >> > > > Viliam Durina >> > > > Jet Developer >> > > > hazelcast® >> > > > >> > > > <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA >> > 94402 | >> > > > USA >> > > > +1 (650) 521-5453 | hazelcast.com <https://www.hazelcast.com> >> > > > >> > > > -- >> > > > This message contains confidential information and is intended only >> for >> > > > the >> > > > individuals named. If you are not the named addressee you should not >> > > > disseminate, distribute or copy this e-mail. Please notify the >> sender >> > > > immediately by e-mail if you have received this e-mail by mistake >> and >> > > > delete this e-mail from your system. E-mail transmission cannot be >> > > > guaranteed to be secure or error-free as information could be >> > intercepted, >> > > > corrupted, lost, destroyed, arrive late or incomplete, or contain >> > viruses. >> > > > The sender therefore does not accept liability for any errors or >> > omissions >> > > > in the contents of this message, which arise as a result of e-mail >> > > > transmission. If verification is required, please request a >> hard-copy >> > > > version. -Hazelcast >> > > > >> > >> >