Hello, I really regret missing the first meeting, sorry about that. I added my preferences in the document. I will make sure to attend the next one and help as much as I can.
I didn't have the chance yet to go over the paper but will try to do it before the next meeting. For me the following dates are more convenient than others so it would be nice if we could arrange it then. Thu, May 6, 10pm PST Tue, May 12, 10pm PST Best, Stamatis On Sat, May 1, 2021 at 9:42 PM Julian Hyde <[email protected]> wrote: > I have added my time preferences to the doc [1]. I am generally > available any evening Mon - Thu. How about we meet Monday 10th May? > > Stamatis, Jesus, Given the complexity of this work, I would very much > appreciate your insight, as experts in optimizer theory. Could one of > you join the next meeting? Of course we should choose a time that > works for everyone's schedule. > > Julian > > [1] > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing > > On Wed, Apr 28, 2021 at 9:32 AM Botong Huang <[email protected]> wrote: > > > > We didn't record it, we will try to record the following meetings. Please > > add your time preference in the docs, so that we can find a meeting time > > that works for more people. > > > > Thanks, > > Botong > > > > On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina <[email protected]> > wrote: > > > > > Is there a recording available? > > > Viliam > > > > > > On Wed, 28 Apr 2021 at 00:15, Botong Huang <[email protected]> wrote: > > > > > > > Hi all, > > > > > > > > The meeting yesterday was fun and productive. As discussed, this is > the > > > > call to schedule our second meeting. > > > > > > > > We encourage everyone to add their time preferences during 05/01 - > 05/15 > > > > here: > > > > > > > > > > > > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing > > > > > > > > Thanks, > > > > Botong > > > > > > > > On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <[email protected]> > wrote: > > > > > > > > > Hi all, > > > > > We've created a zoom meeting below for our meeting next Monday > > > > > (9pm-10:30pm PST on 04/26). > > > > > Talk to you all soon! > > > > > > > > > > Join Zoom Meeting > > > > > https://uci.zoom.us/j/91279732686 > > > > > < > > > > > > > > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE > > > > > > > > > > > > > > > Meeting ID: 912 7973 2686 > > > > > One tap mobile > > > > > +16699006833,,91279732686# US (San Jose) > > > > > +12532158782,,91279732686# US (Tacoma) > > > > > > > > > > Dial by your location > > > > > +1 669 900 6833 US (San Jose) > > > > > +1 253 215 8782 US (Tacoma) > > > > > +1 346 248 7799 US (Houston) > > > > > +1 301 715 8592 US (Washington DC) > > > > > +1 312 626 6799 US (Chicago) > > > > > +1 646 558 8656 US (New York) > > > > > Meeting ID: 912 7973 2686 > > > > > Find your local number: https://uci.zoom.us/u/aykHTkJBh > > > > > < > > > > > > > > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM > > > > > > > > > > > > > > > Join by Skype for Business > > > > > https://uci.zoom.us/skype/91279732686 > > > > > < > > > > > > > > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > Botong > > > > > > > > > > On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <[email protected]> > > > wrote: > > > > > > > > > >> Hi all, > > > > >> > > > > >> According to the preferences collected, we are tentatively > scheduling > > > > our > > > > >> meeting at 9pm-10:30pm PST on 04/26 Monday. > > > > >> > > > > >> We will give a presentation about Tempura, followed by a free > > > > discussion. > > > > >> > > > > >> Please let us know if there are new other requests. Few days > before > > > > >> the meeting, I will send out a zoom meeting link. > > > > >> > > > > >> Thanks, > > > > >> Botong > > > > >> > > > > >> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <[email protected]> > wrote: > > > > >> > > > > >>> Hi Julian and all, > > > > >>> > > > > >>> We've posted the Tempura code base below. Feel free to take a > quick > > > > peek > > > > >>> at the last five commits. > > > > >>> > > > > > https://github.com/alibaba/cost-based-incremental-optimizer/commits/main > > > > >>> > > > > >>> I've also opened a Jira (CALCITE-4568 > > > > >>> <https://issues.apache.org/jira/browse/CALCITE-4568>), which > will > > > > serve > > > > >>> as the umbrella Jira for the feature. > > > > >>> > > > > >>> In the meantime, we encourage everyone to enter the time > preferences > > > > for > > > > >>> our first meeting here: > > > > >>> > > > > >>> > > > > > > > > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing > > > > >>> > > > > >>> Thanks, > > > > >>> Botong > > > > >>> > > > > >>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde < > [email protected]> > > > > >>> wrote: > > > > >>> > > > > >>>> I have added my time preferences to the doc. > > > > >>>> > > > > >>>> Before we meet, could you publish a PR for us to review? > > > > >>>> > > > > >>>> Initial discussions will need to be about architecture and > > > high-level > > > > >>>> design. So I would ask Calcite reviewers not to review the PR > > > > line-by-line > > > > >>>> (or to leave comments in GitHub) but try to understand the > design > > > > >>>> holistically, and prepare questions/comments before the meeting. > > > > >>>> > > > > >>>> Botong, Can you please create a Calcite JIRA case for this task? > > > JIRA > > > > >>>> how we track long-running tasks such as this. > > > > >>>> > > > > >>>> Julian > > > > >>>> > > > > >>>> > > > > >>>> > On Apr 3, 2021, at 5:15 PM, Botong Huang <[email protected]> > > > wrote: > > > > >>>> > > > > > >>>> > Hi all, > > > > >>>> > > > > > >>>> > Apology for the delay. It took us some time to clean up our > code > > > > base > > > > >>>> and > > > > >>>> > publicly release it (which will be out soon) for a quick peek. > > > > >>>> > > > > > >>>> > We are ready to present our work. Let's schedule a time for a > Zoom > > > > >>>> > meeting and discuss how to integrate Tempura into Calcite. > > > > >>>> > > > > > >>>> > Since some of our team members are in China, we prefer the > time > > > slot > > > > >>>> of > > > > >>>> > 7:00pm-11:30pm PST any day. I've added our time preference in > the > > > > >>>> shared > > > > >>>> > doc below. > > > > >>>> > > > > > >>>> > > > > > > > > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing > > > > >>>> > > > > > >>>> > We encourage everyone to add their time preferences (during > > > > >>>> 04/15-04/30) in > > > > >>>> > this doc. In a week or so, we will try to settle a time that > works > > > > for > > > > >>>> > most. > > > > >>>> > > > > > >>>> > Thanks, > > > > >>>> > Botong > > > > >>>> > > > > > >>>> > On Sat, Jan 30, 2021 at 9:19 PM Botong Huang < > [email protected]> > > > > >>>> wrote: > > > > >>>> > > > > > >>>> >> Hi Julian and Rui, > > > > >>>> >> > > > > >>>> >> Sounds good to us. Please give us some time to prepare some > > > slides > > > > >>>> for the > > > > >>>> >> meeting. > > > > >>>> >> > > > > >>>> >> I've created a doc below for discussion. Please feel free to > add > > > > >>>> more in > > > > >>>> >> here: > > > > >>>> >> > > > > >>>> >> > > > > >>>> > > > > > > > > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing > > > > >>>> >> > > > > >>>> >> Thanks, > > > > >>>> >> Botong > > > > >>>> >> > > > > >>>> >> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde < > > > > [email protected] > > > > >>>> > > > > > >>>> >> wrote: > > > > >>>> >> > > > > >>>> >>> PS The “editable doc” that Rui refers to is also a good > idea. I > > > > >>>> think we > > > > >>>> >>> should create it to continue discussion after the first > meeting. > > > > >>>> >>> > > > > >>>> >>> Julian > > > > >>>> >>> > > > > >>>> >>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde < > > > > [email protected]> > > > > >>>> >>> wrote: > > > > >>>> >>>> > > > > >>>> >>>> I think good next steps would be a PR and a meeting. The > PR > > > will > > > > >>>> allow > > > > >>>> >>> us to read the code, but I think we should do the first > round of > > > > >>>> questions > > > > >>>> >>> at the meeting. The meeting could perhaps start with a > > > > >>>> presentation of the > > > > >>>> >>> paper (do you have some slides you are planning to present > at > > > > VLDB, > > > > >>>> >>> Botong?) and then move on to questions about the concepts, > which > > > > >>>> >>> alternatives were considered, and how the concepts map onto > > > other > > > > >>>> current > > > > >>>> >>> and future concepts in calcite. > > > > >>>> >>>> > > > > >>>> >>>> I don’t think we should start “reviewing” the PR > line-by-line > > > at > > > > >>>> this > > > > >>>> >>> point. We need to understand the high-level concepts and > design > > > > >>>> choices. If > > > > >>>> >>> we start reviewing the PR we will get lost in the details. > > > > >>>> >>>> > > > > >>>> >>>> I know that integrating a major change is hard; I doubt > that we > > > > >>>> will be > > > > >>>> >>> able to integrate everything, but we can build understanding > > > about > > > > >>>> where > > > > >>>> >>> calcite needs to go, and I hope integrate a good amount of > code > > > to > > > > >>>> help us > > > > >>>> >>> get there. > > > > >>>> >>>> > > > > >>>> >>>> As I said before, after the integration I would like > people to > > > be > > > > >>>> able > > > > >>>> >>> to experiment with it and use it in their production > systems. > > > > That > > > > >>>> way, it > > > > >>>> >>> will not be an experiment that withers, but a feature set > > > > >>>> integrates with > > > > >>>> >>> other calcite features and gets stronger over time. > > > > >>>> >>>> > > > > >>>> >>>> Julian > > > > >>>> >>>> > > > > >>>> >>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang < > [email protected]> > > > > >>>> wrote: > > > > >>>> >>>>> > > > > >>>> >>>>> For me to participate in the discussion for the above > > > > questions, > > > > >>>> I > > > > >>>> >>> will > > > > >>>> >>>>> need to read a lot more to know relevant context and > likely > > > ask > > > > >>>> lots of > > > > >>>> >>>>> questions :-). A editable doc is probably good for > questions > > > > and > > > > >>>> back > > > > >>>> >>> and > > > > >>>> >>>>> forward discussion. > > > > >>>> >>>>> > > > > >>>> >>>>> > > > > >>>> >>>>> -Rui > > > > >>>> >>>>> > > > > >>>> >>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang < > > > > [email protected] > > > > >>>> > > > > > >>>> >>> wrote: > > > > >>>> >>>>>> > > > > >>>> >>>>>> I am also happy to help push this work into Calcite > (review > > > > code > > > > >>>> and > > > > >>>> >>> doc, > > > > >>>> >>>>>> etc.). > > > > >>>> >>>>>> > > > > >>>> >>>>>> While you can share your code so people can have more > idea > > > how > > > > >>>> it is > > > > >>>> >>>>>> implemented, I think it would be also nice to have a doc > to > > > > >>>> discuss > > > > >>>> >>> open > > > > >>>> >>>>>> questions above. Some points that I copy those to here: > > > > >>>> >>>>>> > > > > >>>> >>>>>> 1. Can this solution be compatible with existing > solutions in > > > > >>>> Calcite > > > > >>>> >>>>>> Streaming, materialized view maintenance, and multi-query > > > > >>>> optimization > > > > >>>> >>>>>> (Sigma and Delta relational operators, lattice, and Spool > > > > >>>> operator), > > > > >>>> >>>>>> 2. Did you find that you needed two separate cost models > - > > > one > > > > >>>> for > > > > >>>> >>> “view > > > > >>>> >>>>>> maintenance” and another for “user queries” - since the > > > > >>>> objectives of > > > > >>>> >>> each > > > > >>>> >>>>>> activity are so different? > > > > >>>> >>>>>> 3. whether this work will hasten the arrival of > > > multi-objective > > > > >>>> >>> parametric > > > > >>>> >>>>>> query optimization [1] in Calcite. > > > > >>>> >>>>>> 4. probably SQL shell support. > > > > >>>> >>>>>> > > > > >>>> >>>>>> > > > > >>>> >>>>>> [1]: > > > > >>>> >>>>>> > > > > >>>> >>> > > > > >>>> > > > > > > > > https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext > > > > >>>> >>>>>> > > > > >>>> >>>>>> > > > > >>>> >>>>>> -Rui > > > > >>>> >>>>>> > > > > >>>> >>>>>> > > > > >>>> >>>>>> > > > > >>>> >>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert < > [email protected]> > > > > >>>> wrote: > > > > >>>> >>>>>>> > > > > >>>> >>>>>>> it would be very nice to see a POC of your work. > > > > >>>> >>>>>>> > > > > >>>> >>>>>>> > > > > >>>> >>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang < > > > > >>>> [email protected]> > > > > >>>> >>> wrote: > > > > >>>> >>>>>>> > > > > >>>> >>>>>>>> Hi Julian, > > > > >>>> >>>>>>>> > > > > >>>> >>>>>>>> Just wondering if there are any updates? We are > wondering > > > if > > > > it > > > > >>>> >>> would > > > > >>>> >>>>>>> help > > > > >>>> >>>>>>>> to post our code for a quick preview. > > > > >>>> >>>>>>>> > > > > >>>> >>>>>>>> Thanks, > > > > >>>> >>>>>>>> Botong > > > > >>>> >>>>>>>> > > > > >>>> >>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang < > > > > [email protected] > > > > >>>> > > > > > >>>> >>> wrote: > > > > >>>> >>>>>>>> > > > > >>>> >>>>>>>>> Hi Julian, > > > > >>>> >>>>>>>>> > > > > >>>> >>>>>>>>> Thanks for your interest! Sure let's figure out a plan > > > that > > > > >>>> best > > > > >>>> >>>>>>> benefits > > > > >>>> >>>>>>>>> the community. Here are some clarifications that > hopefully > > > > >>>> answer > > > > >>>> >>> your > > > > >>>> >>>>>>>>> questions. > > > > >>>> >>>>>>>>> > > > > >>>> >>>>>>>>> In our work (Tempura), users specify the set of time > > > points > > > > to > > > > >>>> >>>>>>> consider > > > > >>>> >>>>>>>>> running and a cost function that expresses users' > > > preference > > > > >>>> over > > > > >>>> >>>>>>> time, > > > > >>>> >>>>>>>>> Tempura will generate the best incremental plan that > > > > >>>> minimizes the > > > > >>>> >>>>>>>> overall > > > > >>>> >>>>>>>>> cost function. > > > > >>>> >>>>>>>>> > > > > >>>> >>>>>>>>> In this incremental plan, the sub-plans at different > time > > > > >>>> points > > > > >>>> >>> can > > > > >>>> >>>>>>> be > > > > >>>> >>>>>>>>> different from each other, as opposed to identical > plans > > > in > > > > >>>> all > > > > >>>> >>> delta > > > > >>>> >>>>>>>> runs > > > > >>>> >>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of the > > > Tempura > > > > >>>> paper, > > > > >>>> >>> we > > > > >>>> >>>>>>> can > > > > >>>> >>>>>>>>> mimic the current streaming implementation by > specifying > > > two > > > > >>>> >>> (logical) > > > > >>>> >>>>>>>> time > > > > >>>> >>>>>>>>> points in Tempura, representing the initial run and > later > > > > >>>> delta > > > > >>>> >>> runs > > > > >>>> >>>>>>>>> respectively. In general, note that Tempura supports > > > various > > > > >>>> form > > > > >>>> >>> of > > > > >>>> >>>>>>>>> incremental computing, not only the small-delta > > > append-only > > > > >>>> data > > > > >>>> >>>>>>> model in > > > > >>>> >>>>>>>>> streaming systems. That's why we believe Tempura > subsumes > > > > the > > > > >>>> >>> current > > > > >>>> >>>>>>>>> streaming support, as well as any IVM implementations. > > > > >>>> >>>>>>>>> > > > > >>>> >>>>>>>>> About the cost model, we did not come up with a > seperate > > > > cost > > > > >>>> >>> model, > > > > >>>> >>>>>>> but > > > > >>>> >>>>>>>>> rather extended the existing one. Similar to > > > multi-objective > > > > >>>> >>>>>>>> optimization, > > > > >>>> >>>>>>>>> costs incurred at different time points are considered > > > > >>>> different > > > > >>>> >>>>>>>>> dimensions. Tempura lets users supply a function that > > > > >>>> converts this > > > > >>>> >>>>>>> cost > > > > >>>> >>>>>>>>> vector into a final cost. So under this function, any > two > > > > >>>> >>> incremental > > > > >>>> >>>>>>>> plans > > > > >>>> >>>>>>>>> are still comparable and there is an overall optimum. > I > > > > guess > > > > >>>> we > > > > >>>> >>> can > > > > >>>> >>>>>>> go > > > > >>>> >>>>>>>>> down the route of multi-objective parametric query > > > > >>>> optimization > > > > >>>> >>>>>>> instead > > > > >>>> >>>>>>>> if > > > > >>>> >>>>>>>>> there is a need. > > > > >>>> >>>>>>>>> > > > > >>>> >>>>>>>>> Next on materialized views and multi-query > optimization, > > > > >>>> since our > > > > >>>> >>>>>>>>> multi-time-point plan naturally involves materializing > > > > >>>> intermediate > > > > >>>> >>>>>>>> results > > > > >>>> >>>>>>>>> for later time points, we need to solve the problem of > > > > >>>> choosing > > > > >>>> >>>>>>>>> materializations and include the cost of saving and > > > reusing > > > > >>>> the > > > > >>>> >>>>>>>>> materializations when costing and comparing plans. We > > > > >>>> borrowed the > > > > >>>> >>>>>>>>> multi-query optimization techniques to solve this > problem > > > > even > > > > >>>> >>> though > > > > >>>> >>>>>>> we > > > > >>>> >>>>>>>>> are looking at a single query. As a result, we think > our > > > > work > > > > >>>> is > > > > >>>> >>>>>>>> orthogonal > > > > >>>> >>>>>>>>> to Calcite's facilities around utilizing existing > views, > > > > >>>> lattice > > > > >>>> >>> etc. > > > > >>>> >>>>>>> We > > > > >>>> >>>>>>>> do > > > > >>>> >>>>>>>>> feel that the multi-query optimization component can > be > > > > >>>> adopted to > > > > >>>> >>>>>>> wider > > > > >>>> >>>>>>>>> use, but probably need more suggestions from the > > > community. > > > > >>>> >>>>>>>>> > > > > >>>> >>>>>>>>> Lastly, our current implementation is set up in java > code, > > > > it > > > > >>>> >>> should > > > > >>>> >>>>>>> be > > > > >>>> >>>>>>>>> straightforward to hook it up with SQL shell. > > > > >>>> >>>>>>>>> > > > > >>>> >>>>>>>>> Thanks, > > > > >>>> >>>>>>>>> Botong > > > > >>>> >>>>>>>>> > > > > >>>> >>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde < > > > > >>>> >>> [email protected]> > > > > >>>> >>>>>>>>> wrote: > > > > >>>> >>>>>>>>> > > > > >>>> >>>>>>>>>> Botong, > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>> This is very exciting; congratulations on this > research, > > > > and > > > > >>>> thank > > > > >>>> >>>>>>> you > > > > >>>> >>>>>>>>>> for contributing it back to Calcite. > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>> The research touches several areas in Calcite: > streaming, > > > > >>>> >>>>>>> materialized > > > > >>>> >>>>>>>>>> view maintenance, and multi-query optimization. As we > > > have > > > > >>>> already > > > > >>>> >>>>>>> some > > > > >>>> >>>>>>>>>> solutions in those areas (Sigma and Delta relational > > > > >>>> operators, > > > > >>>> >>>>>>> lattice, > > > > >>>> >>>>>>>>>> and Spool operator), it will be interesting to see > > > whether > > > > >>>> we can > > > > >>>> >>>>>>> make > > > > >>>> >>>>>>>> them > > > > >>>> >>>>>>>>>> compatible, or whether one concept can subsume > others. > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>> Your work differs from streaming queries in that your > > > > >>>> relations > > > > >>>> >>> are > > > > >>>> >>>>>>> used > > > > >>>> >>>>>>>>>> by “external” user queries, whereas in pure streaming > > > > >>>> queries, the > > > > >>>> >>>>>>> only > > > > >>>> >>>>>>>>>> activity is the change propagation. Did you find > that you > > > > >>>> needed > > > > >>>> >>> two > > > > >>>> >>>>>>>>>> separate cost models - one for “view maintenance” and > > > > >>>> another for > > > > >>>> >>>>>>> “user > > > > >>>> >>>>>>>>>> queries” - since the objectives of each activity are > so > > > > >>>> different? > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>> I wonder whether this work will hasten the arrival of > > > > >>>> >>> multi-objective > > > > >>>> >>>>>>>>>> parametric query optimization [1] in Calcite. > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>> I will make time over the next few days to read and > > > digest > > > > >>>> your > > > > >>>> >>>>>>> paper. > > > > >>>> >>>>>>>>>> Then I expect that we will have a back-and-forth > process > > > to > > > > >>>> create > > > > >>>> >>>>>>>>>> something that will be useful for the broader > community. > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>> One thing will be particularly useful: making this > > > > >>>> functionality > > > > >>>> >>>>>>>>>> available from a SQL shell, so that people can > experiment > > > > >>>> with > > > > >>>> >>> this > > > > >>>> >>>>>>>>>> functionality without writing Java code or setting up > > > > complex > > > > >>>> >>>>>>> databases > > > > >>>> >>>>>>>> and > > > > >>>> >>>>>>>>>> metadata. I have in mind something like the simple > DDL > > > > >>>> operations > > > > >>>> >>>>>>> that > > > > >>>> >>>>>>>> are > > > > >>>> >>>>>>>>>> available in Calcite’s ’server’ module. I wonder > whether > > > we > > > > >>>> could > > > > >>>> >>>>>>> devise > > > > >>>> >>>>>>>>>> some kind of SQL syntax for a “multi-query”. > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>> Julian > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>> [1] > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>> > > > > >>>> >>>>>>> > > > > >>>> >>> > > > > >>>> > > > > > > > > https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang < > > > > [email protected] > > > > >>>> > > > > > >>>> >>>>>>> wrote: > > > > >>>> >>>>>>>>>>> > > > > >>>> >>>>>>>>>>> Thanks Aron for pointing this out. To see the > figure, > > > > please > > > > >>>> >>> refer > > > > >>>> >>>>>>> to > > > > >>>> >>>>>>>>>> Fig > > > > >>>> >>>>>>>>>>> 3(a) in our paper: > > > > >>>> >>>>>>>>>> > https://kai-zeng.github.io/papers/tempura-vldb2021.pdf > > > > >>>> >>>>>>>>>>> > > > > >>>> >>>>>>>>>>> Best, > > > > >>>> >>>>>>>>>>> Botong > > > > >>>> >>>>>>>>>>> > > > > >>>> >>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao < > > > > >>>> [email protected]> > > > > >>>> >>>>>>>> wrote: > > > > >>>> >>>>>>>>>>> > > > > >>>> >>>>>>>>>>>> Seems interesting, the pic can not be seen in the > mail, > > > > >>>> may you > > > > >>>> >>>>>>> open > > > > >>>> >>>>>>>> a > > > > >>>> >>>>>>>>>> JIRA > > > > >>>> >>>>>>>>>>>> for this, people who are interested in this can > > > subscribe > > > > >>>> to the > > > > >>>> >>>>>>>> JIRA? > > > > >>>> >>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>> Regards! > > > > >>>> >>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>> Aron Tao > > > > >>>> >>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>> Botong Huang <[email protected]> 于2020年12月24日周四 > > > > 上午3:18写道: > > > > >>>> >>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> Hi all, > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> This is a proposal to extend the Calcite optimizer > > > into > > > > a > > > > >>>> >>> general > > > > >>>> >>>>>>>>>>>>> incremental query optimizer, based on our research > > > paper > > > > >>>> >>>>>>> published > > > > >>>> >>>>>>>> in > > > > >>>> >>>>>>>>>>>> VLDB > > > > >>>> >>>>>>>>>>>>> 2021: > > > > >>>> >>>>>>>>>>>>> Tempura: a general cost-based optimizer framework > for > > > > >>>> >>> incremental > > > > >>>> >>>>>>>> data > > > > >>>> >>>>>>>>>>>>> processing > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating > how > > > > >>>> Alibaba’s > > > > >>>> >>>>>>> data > > > > >>>> >>>>>>>>>>>>> warehouse is planning to use this incremental > query > > > > >>>> optimizer > > > > >>>> >>> to > > > > >>>> >>>>>>>>>>>> alleviate > > > > >>>> >>>>>>>>>>>>> cluster-wise resource skewness: > > > > >>>> >>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting > Resource-Aware > > > > >>>> >>> Incremental > > > > >>>> >>>>>>>>>>>> Computing > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> To our best knowledge, this is the first general > > > > >>>> cost-based > > > > >>>> >>>>>>>>>> incremental > > > > >>>> >>>>>>>>>>>>> optimizer that can find the best plan across > multiple > > > > >>>> families > > > > >>>> >>> of > > > > >>>> >>>>>>>>>>>>> incremental computing methods, including IVM, > > > Streaming, > > > > >>>> >>>>>>> DBToaster, > > > > >>>> >>>>>>>>>> etc. > > > > >>>> >>>>>>>>>>>>> Experiments (in the paper) shows that the > generated > > > best > > > > >>>> plan > > > > >>>> >>> is > > > > >>>> >>>>>>>>>>>>> consistently much better than the plans from each > > > > >>>> individual > > > > >>>> >>>>>>> method > > > > >>>> >>>>>>>>>>>> alone. > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> In general, incremental query planning is central > to > > > > >>>> database > > > > >>>> >>>>>>> view > > > > >>>> >>>>>>>>>>>>> maintenance and stream processing systems, and are > > > being > > > > >>>> >>> adopted > > > > >>>> >>>>>>> in > > > > >>>> >>>>>>>>>>>> active > > > > >>>> >>>>>>>>>>>>> databases, resumable query execution, approximate > > > query > > > > >>>> >>>>>>> processing, > > > > >>>> >>>>>>>>>> etc. > > > > >>>> >>>>>>>>>>>> We > > > > >>>> >>>>>>>>>>>>> are hoping that this feature can help widening the > > > > >>>> spectrum of > > > > >>>> >>>>>>>>>> Calcite, > > > > >>>> >>>>>>>>>>>>> solicit more use cases and adoption of Calcite. > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> Below is a brief description of the technical > details. > > > > >>>> Please > > > > >>>> >>>>>>> refer > > > > >>>> >>>>>>>> to > > > > >>>> >>>>>>>>>>>> the > > > > >>>> >>>>>>>>>>>>> Tempura paper for more details. We are also > working > > > on a > > > > >>>> >>> journal > > > > >>>> >>>>>>>>>> version > > > > >>>> >>>>>>>>>>>> of > > > > >>>> >>>>>>>>>>>>> the paper with more implementation details. > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> Currently the query plan generated by Calcite is > meant > > > > to > > > > >>>> be > > > > >>>> >>>>>>>> executed > > > > >>>> >>>>>>>>>>>>> altogether at once. In the proposal, Calcite’s > memo > > > will > > > > >>>> be > > > > >>>> >>>>>>> extended > > > > >>>> >>>>>>>>>> with > > > > >>>> >>>>>>>>>>>>> temporal information so that it is capable of > > > generating > > > > >>>> >>>>>>> incremental > > > > >>>> >>>>>>>>>>>> plans > > > > >>>> >>>>>>>>>>>>> that include multiple sub-plans to execute at > > > different > > > > >>>> time > > > > >>>> >>>>>>> points. > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> The main idea is to view each table as one that > > > changes > > > > >>>> over > > > > >>>> >>> time > > > > >>>> >>>>>>>>>> (Time > > > > >>>> >>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we > > > introduced > > > > >>>> >>>>>>> TvrMetaSet > > > > >>>> >>>>>>>>>> into > > > > >>>> >>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to > track > > > > >>>> related > > > > >>>> >>>>>>> RelSets > > > > >>>> >>>>>>>>>> of a > > > > >>>> >>>>>>>>>>>>> changing table (e.g. snapshot of the table at > certain > > > > >>>> time, > > > > >>>> >>>>>>> delta of > > > > >>>> >>>>>>>>>> the > > > > >>>> >>>>>>>>>>>>> table between two time points, etc.). > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> [image: image.png] > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> For example in the above figure, each vertical > line > > > is a > > > > >>>> >>>>>>> TvrMetaSet > > > > >>>> >>>>>>>>>>>>> representing a TVR (S, R, S left outer join R, > etc.). > > > > >>>> >>> Horizontal > > > > >>>> >>>>>>>> lines > > > > >>>> >>>>>>>>>>>>> represent time. Each black dot in the grid is a > > > RelSet. > > > > >>>> Users > > > > >>>> >>> can > > > > >>>> >>>>>>>>>> write > > > > >>>> >>>>>>>>>>>> TVR > > > > >>>> >>>>>>>>>>>>> Rewrite Rules to describe valid transformations > > > between > > > > >>>> these > > > > >>>> >>>>>>> dots. > > > > >>>> >>>>>>>>>> For > > > > >>>> >>>>>>>>>>>>> example, the blues lines are inter-TVR rules that > > > > >>>> describe how > > > > >>>> >>> to > > > > >>>> >>>>>>>>>> compute > > > > >>>> >>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other > TVRs. > > > The > > > > >>>> red > > > > >>>> >>> lines > > > > >>>> >>>>>>>> are > > > > >>>> >>>>>>>>>>>>> intra-TVR rules that describe transformations > within a > > > > >>>> TVR. All > > > > >>>> >>>>>>> TVR > > > > >>>> >>>>>>>>>>>> rewrite > > > > >>>> >>>>>>>>>>>>> rules are logical rules. All existing Calcite > rules > > > > still > > > > >>>> work > > > > >>>> >>> in > > > > >>>> >>>>>>>> the > > > > >>>> >>>>>>>>>> new > > > > >>>> >>>>>>>>>>>>> volcano system without modification. > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> All changes in this feature will consist of four > > > parts: > > > > >>>> >>>>>>>>>>>>> 1. Memo extension with TvrMetaSet > > > > >>>> >>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching > TvrMetaSet > > > > and > > > > >>>> >>>>>>> RelNodes, > > > > >>>> >>>>>>>>>> as > > > > >>>> >>>>>>>>>>>>> well as links in between the nodes. > > > > >>>> >>>>>>>>>>>>> 3. A basic set of TvrRules, written using the > upgraded > > > > >>>> rule > > > > >>>> >>>>>>> engine > > > > >>>> >>>>>>>>>> API. > > > > >>>> >>>>>>>>>>>>> 4. Multi-query optimization, used to find the best > > > > >>>> incremental > > > > >>>> >>>>>>> plan > > > > >>>> >>>>>>>>>>>>> involving multiple time points. > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> Note that this feature is an extension in nature > and > > > > thus > > > > >>>> when > > > > >>>> >>>>>>>>>> disabled, > > > > >>>> >>>>>>>>>>>>> does not change any existing Calcite behavior. > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> Other than scenarios in the paper, we also applied > > > this > > > > >>>> >>>>>>>>>> Calcite-extended > > > > >>>> >>>>>>>>>>>>> incremental query optimizer to a type of periodic > > > query > > > > >>>> called > > > > >>>> >>>>>>> the > > > > >>>> >>>>>>>>>>>> ‘‘range > > > > >>>> >>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It achieved > cost > > > > >>>> savings > > > > >>>> >>> of > > > > >>>> >>>>>>> 80% > > > > >>>> >>>>>>>>>> on > > > > >>>> >>>>>>>>>>>>> total CPU and memory consumption, and 60% on > > > end-to-end > > > > >>>> >>> execution > > > > >>>> >>>>>>>>>> time. > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> All comments and suggestions are welcome. Thanks > and > > > > happy > > > > >>>> >>>>>>> holidays! > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> Best, > > > > >>>> >>>>>>>>>>>>> Botong > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>> > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>> > > > > >>>> >>>>>>> > > > > >>>> >>>>>>> > > > > >>>> >>>>>>> -- > > > > >>>> >>>>>>> ~~~~~~~~~~~~~~~ > > > > >>>> >>>>>>> no mistakes > > > > >>>> >>>>>>> ~~~~~~~~~~~~~~~~~~ > > > > >>>> >>>>>>> > > > > >>>> >>>>>> > > > > >>>> >>> > > > > >>>> >> > > > > >>>> > > > > >>>> > > > > > > > > > > > > > -- > > > Viliam Durina > > > Jet Developer > > > hazelcast® > > > > > > <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA > 94402 | > > > USA > > > +1 (650) 521-5453 | hazelcast.com <https://www.hazelcast.com> > > > > > > -- > > > This message contains confidential information and is intended only for > > > the > > > individuals named. If you are not the named addressee you should not > > > disseminate, distribute or copy this e-mail. Please notify the sender > > > immediately by e-mail if you have received this e-mail by mistake and > > > delete this e-mail from your system. E-mail transmission cannot be > > > guaranteed to be secure or error-free as information could be > intercepted, > > > corrupted, lost, destroyed, arrive late or incomplete, or contain > viruses. > > > The sender therefore does not accept liability for any errors or > omissions > > > in the contents of this message, which arise as a result of e-mail > > > transmission. If verification is required, please request a hard-copy > > > version. -Hazelcast > > > >
