I have added my time preferences to the doc [1]. I am generally
available any evening Mon - Thu. How about we meet Monday 10th May?

Stamatis, Jesus, Given the complexity of this work, I would very much
appreciate your insight, as experts in optimizer theory. Could one of
you join the next meeting? Of course we should choose a time that
works for everyone's schedule.

Julian

[1] 
https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing

On Wed, Apr 28, 2021 at 9:32 AM Botong Huang <pku...@gmail.com> wrote:
>
> We didn't record it, we will try to record the following meetings. Please
> add your time preference in the docs, so that we can find a meeting time
> that works for more people.
>
> Thanks,
> Botong
>
> On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina <vil...@hazelcast.com> wrote:
>
> > Is there a recording available?
> > Viliam
> >
> > On Wed, 28 Apr 2021 at 00:15, Botong Huang <pku...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > The meeting yesterday was fun and productive. As discussed, this is the
> > > call to schedule our second meeting.
> > >
> > > We encourage everyone to add their time preferences during 05/01 - 05/15
> > > here:
> > >
> > >
> > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > >
> > > Thanks,
> > > Botong
> > >
> > > On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <pku...@gmail.com> wrote:
> > >
> > > > Hi all,
> > > > We've created a zoom meeting below for our meeting next Monday
> > > > (9pm-10:30pm PST on 04/26).
> > > > Talk to you all soon!
> > > >
> > > > Join Zoom Meeting
> > > > https://uci.zoom.us/j/91279732686
> > > > <
> > >
> > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE
> > > >
> > > >
> > > > Meeting ID: 912 7973 2686
> > > > One tap mobile
> > > > +16699006833,,91279732686# US (San Jose)
> > > > +12532158782,,91279732686# US (Tacoma)
> > > >
> > > > Dial by your location
> > > > +1 669 900 6833 US (San Jose)
> > > > +1 253 215 8782 US (Tacoma)
> > > > +1 346 248 7799 US (Houston)
> > > > +1 301 715 8592 US (Washington DC)
> > > > +1 312 626 6799 US (Chicago)
> > > > +1 646 558 8656 US (New York)
> > > > Meeting ID: 912 7973 2686
> > > > Find your local number: https://uci.zoom.us/u/aykHTkJBh
> > > > <
> > >
> > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM
> > > >
> > > >
> > > > Join by Skype for Business
> > > > https://uci.zoom.us/skype/91279732686
> > > > <
> > >
> > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy
> > > >
> > > >
> > > >
> > > > Thanks,
> > > > Botong
> > > >
> > > > On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <pku...@gmail.com>
> > wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> According to the preferences collected, we are tentatively scheduling
> > > our
> > > >> meeting at 9pm-10:30pm PST on 04/26 Monday.
> > > >>
> > > >> We will give a presentation about Tempura, followed by a free
> > > discussion.
> > > >>
> > > >> Please let us know if there are new other requests. Few days before
> > > >> the meeting, I will send out a zoom meeting link.
> > > >>
> > > >> Thanks,
> > > >> Botong
> > > >>
> > > >> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <pku...@gmail.com> wrote:
> > > >>
> > > >>> Hi Julian and all,
> > > >>>
> > > >>> We've posted the Tempura code base below. Feel free to take a quick
> > > peek
> > > >>> at the last five commits.
> > > >>>
> > > https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
> > > >>>
> > > >>> I've also opened a Jira (CALCITE-4568
> > > >>> <https://issues.apache.org/jira/browse/CALCITE-4568>), which will
> > > serve
> > > >>> as the umbrella Jira for the feature.
> > > >>>
> > > >>> In the meantime, we encourage everyone to enter the time preferences
> > > for
> > > >>> our first meeting here:
> > > >>>
> > > >>>
> > >
> > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > >>>
> > > >>> Thanks,
> > > >>> Botong
> > > >>>
> > > >>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <jhyde.apa...@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>>> I have added my time preferences to the doc.
> > > >>>>
> > > >>>> Before we meet, could you publish a PR for us to review?
> > > >>>>
> > > >>>> Initial discussions will need to be about architecture and
> > high-level
> > > >>>> design. So I would ask Calcite reviewers not to review the PR
> > > line-by-line
> > > >>>> (or to leave comments in GitHub) but try to understand the design
> > > >>>> holistically, and prepare questions/comments before the meeting.
> > > >>>>
> > > >>>> Botong, Can you please create a Calcite JIRA case for this task?
> > JIRA
> > > >>>> how we track long-running tasks such as this.
> > > >>>>
> > > >>>> Julian
> > > >>>>
> > > >>>>
> > > >>>> > On Apr 3, 2021, at 5:15 PM, Botong Huang <pku...@gmail.com>
> > wrote:
> > > >>>> >
> > > >>>> > Hi all,
> > > >>>> >
> > > >>>> > Apology for the delay. It took us some time to clean up our code
> > > base
> > > >>>> and
> > > >>>> > publicly release it (which will be out soon) for a quick peek.
> > > >>>> >
> > > >>>> > We are ready to present our work. Let's schedule a time for a Zoom
> > > >>>> > meeting and discuss how to integrate Tempura into Calcite.
> > > >>>> >
> > > >>>> > Since some of our team members are in China, we prefer the time
> > slot
> > > >>>> of
> > > >>>> > 7:00pm-11:30pm PST any day. I've added our time preference in the
> > > >>>> shared
> > > >>>> > doc below.
> > > >>>> >
> > > >>>>
> > >
> > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > >>>> >
> > > >>>> > We encourage everyone to add their time preferences (during
> > > >>>> 04/15-04/30) in
> > > >>>> > this doc. In a week or so, we will try to settle a time that works
> > > for
> > > >>>> > most.
> > > >>>> >
> > > >>>> > Thanks,
> > > >>>> > Botong
> > > >>>> >
> > > >>>> > On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <pku...@gmail.com>
> > > >>>> wrote:
> > > >>>> >
> > > >>>> >> Hi Julian and Rui,
> > > >>>> >>
> > > >>>> >> Sounds good to us. Please give us some time to prepare some
> > slides
> > > >>>> for the
> > > >>>> >> meeting.
> > > >>>> >>
> > > >>>> >> I've created a doc below for discussion. Please feel free to add
> > > >>>> more in
> > > >>>> >> here:
> > > >>>> >>
> > > >>>> >>
> > > >>>>
> > >
> > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > >>>> >>
> > > >>>> >> Thanks,
> > > >>>> >> Botong
> > > >>>> >>
> > > >>>> >> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <
> > > jhyde.apa...@gmail.com
> > > >>>> >
> > > >>>> >> wrote:
> > > >>>> >>
> > > >>>> >>> PS The “editable doc” that Rui refers to is also a good idea. I
> > > >>>> think we
> > > >>>> >>> should create it to continue discussion after the first meeting.
> > > >>>> >>>
> > > >>>> >>> Julian
> > > >>>> >>>
> > > >>>> >>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <
> > > jhyde.apa...@gmail.com>
> > > >>>> >>> wrote:
> > > >>>> >>>>
> > > >>>> >>>> I think good next steps would be a PR and a meeting. The PR
> > will
> > > >>>> allow
> > > >>>> >>> us to read the code, but I think we should do the first round of
> > > >>>> questions
> > > >>>> >>> at the meeting.  The meeting could perhaps start with a
> > > >>>> presentation of the
> > > >>>> >>> paper (do you have some slides you are planning to present at
> > > VLDB,
> > > >>>> >>> Botong?) and then move on to questions about the concepts, which
> > > >>>> >>> alternatives were considered, and how the concepts map onto
> > other
> > > >>>> current
> > > >>>> >>> and future concepts in calcite.
> > > >>>> >>>>
> > > >>>> >>>> I don’t think we should start “reviewing” the PR line-by-line
> > at
> > > >>>> this
> > > >>>> >>> point. We need to understand the high-level concepts and design
> > > >>>> choices. If
> > > >>>> >>> we start reviewing the PR we will get lost in the details.
> > > >>>> >>>>
> > > >>>> >>>> I know that integrating a major change is hard; I doubt that we
> > > >>>> will be
> > > >>>> >>> able to integrate everything, but we can build understanding
> > about
> > > >>>> where
> > > >>>> >>> calcite needs to go, and I hope integrate a good amount of code
> > to
> > > >>>> help us
> > > >>>> >>> get there.
> > > >>>> >>>>
> > > >>>> >>>> As I said before, after the integration I would like people to
> > be
> > > >>>> able
> > > >>>> >>> to experiment with it and use it in their production systems.
> > > That
> > > >>>> way, it
> > > >>>> >>> will not be an experiment that withers, but a feature set
> > > >>>> integrates with
> > > >>>> >>> other calcite features and gets stronger over time.
> > > >>>> >>>>
> > > >>>> >>>> Julian
> > > >>>> >>>>
> > > >>>> >>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <amaliu...@apache.org>
> > > >>>> wrote:
> > > >>>> >>>>>
> > > >>>> >>>>> For me to participate in the discussion for the above
> > > questions,
> > > >>>> I
> > > >>>> >>> will
> > > >>>> >>>>> need to read a lot more to know relevant context and likely
> > ask
> > > >>>> lots of
> > > >>>> >>>>> questions :-).  A editable doc is probably good for questions
> > > and
> > > >>>> back
> > > >>>> >>> and
> > > >>>> >>>>> forward discussion.
> > > >>>> >>>>>
> > > >>>> >>>>>
> > > >>>> >>>>> -Rui
> > > >>>> >>>>>
> > > >>>> >>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <
> > > amaliu...@apache.org
> > > >>>> >
> > > >>>> >>> wrote:
> > > >>>> >>>>>>
> > > >>>> >>>>>> I am also happy to help push this work into Calcite (review
> > > code
> > > >>>> and
> > > >>>> >>> doc,
> > > >>>> >>>>>> etc.).
> > > >>>> >>>>>>
> > > >>>> >>>>>> While you can share your code so people can have more idea
> > how
> > > >>>> it is
> > > >>>> >>>>>> implemented, I think it would be also nice to have a doc to
> > > >>>> discuss
> > > >>>> >>> open
> > > >>>> >>>>>> questions above. Some points that I copy those to here:
> > > >>>> >>>>>>
> > > >>>> >>>>>> 1. Can this solution be compatible with existing solutions in
> > > >>>> Calcite
> > > >>>> >>>>>> Streaming, materialized view maintenance, and multi-query
> > > >>>> optimization
> > > >>>> >>>>>> (Sigma and Delta relational operators, lattice, and Spool
> > > >>>> operator),
> > > >>>> >>>>>> 2. Did you find that you needed two separate cost models -
> > one
> > > >>>> for
> > > >>>> >>> “view
> > > >>>> >>>>>> maintenance” and another for “user queries” - since the
> > > >>>> objectives of
> > > >>>> >>> each
> > > >>>> >>>>>> activity are so different?
> > > >>>> >>>>>> 3. whether this work will hasten the arrival of
> > multi-objective
> > > >>>> >>> parametric
> > > >>>> >>>>>> query optimization [1] in Calcite.
> > > >>>> >>>>>> 4. probably SQL shell support.
> > > >>>> >>>>>>
> > > >>>> >>>>>>
> > > >>>> >>>>>> [1]:
> > > >>>> >>>>>>
> > > >>>> >>>
> > > >>>>
> > >
> > https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> > > >>>> >>>>>>
> > > >>>> >>>>>>
> > > >>>> >>>>>> -Rui
> > > >>>> >>>>>>
> > > >>>> >>>>>>
> > > >>>> >>>>>>
> > > >>>> >>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <zinki...@gmail.com>
> > > >>>> wrote:
> > > >>>> >>>>>>>
> > > >>>> >>>>>>> it would be very nice to see a POC of your work.
> > > >>>> >>>>>>>
> > > >>>> >>>>>>>
> > > >>>> >>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
> > > >>>> pku...@gmail.com>
> > > >>>> >>> wrote:
> > > >>>> >>>>>>>
> > > >>>> >>>>>>>> Hi Julian,
> > > >>>> >>>>>>>>
> > > >>>> >>>>>>>> Just wondering if there are any updates? We are wondering
> > if
> > > it
> > > >>>> >>> would
> > > >>>> >>>>>>> help
> > > >>>> >>>>>>>> to post our code for a quick preview.
> > > >>>> >>>>>>>>
> > > >>>> >>>>>>>> Thanks,
> > > >>>> >>>>>>>> Botong
> > > >>>> >>>>>>>>
> > > >>>> >>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <
> > > pku...@gmail.com
> > > >>>> >
> > > >>>> >>> wrote:
> > > >>>> >>>>>>>>
> > > >>>> >>>>>>>>> Hi Julian,
> > > >>>> >>>>>>>>>
> > > >>>> >>>>>>>>> Thanks for your interest! Sure let's figure out a plan
> > that
> > > >>>> best
> > > >>>> >>>>>>> benefits
> > > >>>> >>>>>>>>> the community. Here are some clarifications that hopefully
> > > >>>> answer
> > > >>>> >>> your
> > > >>>> >>>>>>>>> questions.
> > > >>>> >>>>>>>>>
> > > >>>> >>>>>>>>> In our work (Tempura), users specify the set of time
> > points
> > > to
> > > >>>> >>>>>>> consider
> > > >>>> >>>>>>>>> running and a cost function that expresses users'
> > preference
> > > >>>> over
> > > >>>> >>>>>>> time,
> > > >>>> >>>>>>>>> Tempura will generate the best incremental plan that
> > > >>>> minimizes the
> > > >>>> >>>>>>>> overall
> > > >>>> >>>>>>>>> cost function.
> > > >>>> >>>>>>>>>
> > > >>>> >>>>>>>>> In this incremental plan, the sub-plans at different time
> > > >>>> points
> > > >>>> >>> can
> > > >>>> >>>>>>> be
> > > >>>> >>>>>>>>> different from each other, as opposed to identical plans
> > in
> > > >>>> all
> > > >>>> >>> delta
> > > >>>> >>>>>>>> runs
> > > >>>> >>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of the
> > Tempura
> > > >>>> paper,
> > > >>>> >>> we
> > > >>>> >>>>>>> can
> > > >>>> >>>>>>>>> mimic the current streaming implementation by specifying
> > two
> > > >>>> >>> (logical)
> > > >>>> >>>>>>>> time
> > > >>>> >>>>>>>>> points in Tempura, representing the initial run and later
> > > >>>> delta
> > > >>>> >>> runs
> > > >>>> >>>>>>>>> respectively. In general, note that Tempura supports
> > various
> > > >>>> form
> > > >>>> >>> of
> > > >>>> >>>>>>>>> incremental computing, not only the small-delta
> > append-only
> > > >>>> data
> > > >>>> >>>>>>> model in
> > > >>>> >>>>>>>>> streaming systems. That's why we believe Tempura subsumes
> > > the
> > > >>>> >>> current
> > > >>>> >>>>>>>>> streaming support, as well as any IVM implementations.
> > > >>>> >>>>>>>>>
> > > >>>> >>>>>>>>> About the cost model, we did not come up with a seperate
> > > cost
> > > >>>> >>> model,
> > > >>>> >>>>>>> but
> > > >>>> >>>>>>>>> rather extended the existing one. Similar to
> > multi-objective
> > > >>>> >>>>>>>> optimization,
> > > >>>> >>>>>>>>> costs incurred at different time points are considered
> > > >>>> different
> > > >>>> >>>>>>>>> dimensions. Tempura lets users supply a function that
> > > >>>> converts this
> > > >>>> >>>>>>> cost
> > > >>>> >>>>>>>>> vector into a final cost. So under this function, any two
> > > >>>> >>> incremental
> > > >>>> >>>>>>>> plans
> > > >>>> >>>>>>>>> are still comparable and there is an overall optimum. I
> > > guess
> > > >>>> we
> > > >>>> >>> can
> > > >>>> >>>>>>> go
> > > >>>> >>>>>>>>> down the route of multi-objective parametric query
> > > >>>> optimization
> > > >>>> >>>>>>> instead
> > > >>>> >>>>>>>> if
> > > >>>> >>>>>>>>> there is a need.
> > > >>>> >>>>>>>>>
> > > >>>> >>>>>>>>> Next on materialized views and multi-query optimization,
> > > >>>> since our
> > > >>>> >>>>>>>>> multi-time-point plan naturally involves materializing
> > > >>>> intermediate
> > > >>>> >>>>>>>> results
> > > >>>> >>>>>>>>> for later time points, we need to solve the problem of
> > > >>>> choosing
> > > >>>> >>>>>>>>> materializations and include the cost of saving and
> > reusing
> > > >>>> the
> > > >>>> >>>>>>>>> materializations when costing and comparing plans. We
> > > >>>> borrowed the
> > > >>>> >>>>>>>>> multi-query optimization techniques to solve this problem
> > > even
> > > >>>> >>> though
> > > >>>> >>>>>>> we
> > > >>>> >>>>>>>>> are looking at a single query. As a result, we think our
> > > work
> > > >>>> is
> > > >>>> >>>>>>>> orthogonal
> > > >>>> >>>>>>>>> to Calcite's facilities around utilizing existing views,
> > > >>>> lattice
> > > >>>> >>> etc.
> > > >>>> >>>>>>> We
> > > >>>> >>>>>>>> do
> > > >>>> >>>>>>>>> feel that the multi-query optimization component can be
> > > >>>> adopted to
> > > >>>> >>>>>>> wider
> > > >>>> >>>>>>>>> use, but probably need more suggestions from the
> > community.
> > > >>>> >>>>>>>>>
> > > >>>> >>>>>>>>> Lastly, our current implementation is set up in java code,
> > > it
> > > >>>> >>> should
> > > >>>> >>>>>>> be
> > > >>>> >>>>>>>>> straightforward to hook it up with SQL shell.
> > > >>>> >>>>>>>>>
> > > >>>> >>>>>>>>> Thanks,
> > > >>>> >>>>>>>>> Botong
> > > >>>> >>>>>>>>>
> > > >>>> >>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
> > > >>>> >>> jhyde.apa...@gmail.com>
> > > >>>> >>>>>>>>> wrote:
> > > >>>> >>>>>>>>>
> > > >>>> >>>>>>>>>> Botong,
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>> This is very exciting; congratulations on this research,
> > > and
> > > >>>> thank
> > > >>>> >>>>>>> you
> > > >>>> >>>>>>>>>> for contributing it back to Calcite.
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>> The research touches several areas in Calcite: streaming,
> > > >>>> >>>>>>> materialized
> > > >>>> >>>>>>>>>> view maintenance, and multi-query optimization. As we
> > have
> > > >>>> already
> > > >>>> >>>>>>> some
> > > >>>> >>>>>>>>>> solutions in those areas (Sigma and Delta relational
> > > >>>> operators,
> > > >>>> >>>>>>> lattice,
> > > >>>> >>>>>>>>>> and Spool operator), it will be interesting to see
> > whether
> > > >>>> we can
> > > >>>> >>>>>>> make
> > > >>>> >>>>>>>> them
> > > >>>> >>>>>>>>>> compatible, or whether one concept can subsume others.
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>> Your work differs from streaming queries in that your
> > > >>>> relations
> > > >>>> >>> are
> > > >>>> >>>>>>> used
> > > >>>> >>>>>>>>>> by “external” user queries, whereas in pure streaming
> > > >>>> queries, the
> > > >>>> >>>>>>> only
> > > >>>> >>>>>>>>>> activity is the change propagation. Did you find that you
> > > >>>> needed
> > > >>>> >>> two
> > > >>>> >>>>>>>>>> separate cost models - one for “view maintenance” and
> > > >>>> another for
> > > >>>> >>>>>>> “user
> > > >>>> >>>>>>>>>> queries” - since the objectives of each activity are so
> > > >>>> different?
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>> I wonder whether this work will hasten the arrival of
> > > >>>> >>> multi-objective
> > > >>>> >>>>>>>>>> parametric query optimization [1] in Calcite.
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>> I will make time over the next few days to read and
> > digest
> > > >>>> your
> > > >>>> >>>>>>> paper.
> > > >>>> >>>>>>>>>> Then I expect that we will have a back-and-forth process
> > to
> > > >>>> create
> > > >>>> >>>>>>>>>> something that will be useful for the broader community.
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>> One thing will be particularly useful: making this
> > > >>>> functionality
> > > >>>> >>>>>>>>>> available from a SQL shell, so that people can experiment
> > > >>>> with
> > > >>>> >>> this
> > > >>>> >>>>>>>>>> functionality without writing Java code or setting up
> > > complex
> > > >>>> >>>>>>> databases
> > > >>>> >>>>>>>> and
> > > >>>> >>>>>>>>>> metadata. I have in mind something like the simple DDL
> > > >>>> operations
> > > >>>> >>>>>>> that
> > > >>>> >>>>>>>> are
> > > >>>> >>>>>>>>>> available in Calcite’s ’server’ module. I wonder whether
> > we
> > > >>>> could
> > > >>>> >>>>>>> devise
> > > >>>> >>>>>>>>>> some kind of SQL syntax for a “multi-query”.
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>> Julian
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>> [1]
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>
> > > >>>> >>>>>>>
> > > >>>> >>>
> > > >>>>
> > >
> > https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <
> > > pku...@gmail.com
> > > >>>> >
> > > >>>> >>>>>>> wrote:
> > > >>>> >>>>>>>>>>>
> > > >>>> >>>>>>>>>>> Thanks Aron for pointing this out. To see the figure,
> > > please
> > > >>>> >>> refer
> > > >>>> >>>>>>> to
> > > >>>> >>>>>>>>>> Fig
> > > >>>> >>>>>>>>>>> 3(a) in our paper:
> > > >>>> >>>>>>>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
> > > >>>> >>>>>>>>>>>
> > > >>>> >>>>>>>>>>> Best,
> > > >>>> >>>>>>>>>>> Botong
> > > >>>> >>>>>>>>>>>
> > > >>>> >>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
> > > >>>> taojia...@gmail.com>
> > > >>>> >>>>>>>> wrote:
> > > >>>> >>>>>>>>>>>
> > > >>>> >>>>>>>>>>>> Seems interesting, the pic can not be seen in the mail,
> > > >>>> may you
> > > >>>> >>>>>>> open
> > > >>>> >>>>>>>> a
> > > >>>> >>>>>>>>>> JIRA
> > > >>>> >>>>>>>>>>>> for this, people who are interested in this can
> > subscribe
> > > >>>> to the
> > > >>>> >>>>>>>> JIRA?
> > > >>>> >>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>> Regards!
> > > >>>> >>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>> Aron Tao
> > > >>>> >>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>> Botong Huang <bot...@apache.org> 于2020年12月24日周四
> > > 上午3:18写道:
> > > >>>> >>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> Hi all,
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> This is a proposal to extend the Calcite optimizer
> > into
> > > a
> > > >>>> >>> general
> > > >>>> >>>>>>>>>>>>> incremental query optimizer, based on our research
> > paper
> > > >>>> >>>>>>> published
> > > >>>> >>>>>>>> in
> > > >>>> >>>>>>>>>>>> VLDB
> > > >>>> >>>>>>>>>>>>> 2021:
> > > >>>> >>>>>>>>>>>>> Tempura: a general cost-based optimizer framework for
> > > >>>> >>> incremental
> > > >>>> >>>>>>>> data
> > > >>>> >>>>>>>>>>>>> processing
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating how
> > > >>>> Alibaba’s
> > > >>>> >>>>>>> data
> > > >>>> >>>>>>>>>>>>> warehouse is planning to use this incremental query
> > > >>>> optimizer
> > > >>>> >>> to
> > > >>>> >>>>>>>>>>>> alleviate
> > > >>>> >>>>>>>>>>>>> cluster-wise resource skewness:
> > > >>>> >>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting Resource-Aware
> > > >>>> >>> Incremental
> > > >>>> >>>>>>>>>>>> Computing
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> To our best knowledge, this is the first general
> > > >>>> cost-based
> > > >>>> >>>>>>>>>> incremental
> > > >>>> >>>>>>>>>>>>> optimizer that can find the best plan across multiple
> > > >>>> families
> > > >>>> >>> of
> > > >>>> >>>>>>>>>>>>> incremental computing methods, including IVM,
> > Streaming,
> > > >>>> >>>>>>> DBToaster,
> > > >>>> >>>>>>>>>> etc.
> > > >>>> >>>>>>>>>>>>> Experiments (in the paper) shows that the generated
> > best
> > > >>>> plan
> > > >>>> >>> is
> > > >>>> >>>>>>>>>>>>> consistently much better than the plans from each
> > > >>>> individual
> > > >>>> >>>>>>> method
> > > >>>> >>>>>>>>>>>> alone.
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> In general, incremental query planning is central to
> > > >>>> database
> > > >>>> >>>>>>> view
> > > >>>> >>>>>>>>>>>>> maintenance and stream processing systems, and are
> > being
> > > >>>> >>> adopted
> > > >>>> >>>>>>> in
> > > >>>> >>>>>>>>>>>> active
> > > >>>> >>>>>>>>>>>>> databases, resumable query execution, approximate
> > query
> > > >>>> >>>>>>> processing,
> > > >>>> >>>>>>>>>> etc.
> > > >>>> >>>>>>>>>>>> We
> > > >>>> >>>>>>>>>>>>> are hoping that this feature can help widening the
> > > >>>> spectrum of
> > > >>>> >>>>>>>>>> Calcite,
> > > >>>> >>>>>>>>>>>>> solicit more use cases and adoption of Calcite.
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> Below is a brief description of the technical details.
> > > >>>> Please
> > > >>>> >>>>>>> refer
> > > >>>> >>>>>>>> to
> > > >>>> >>>>>>>>>>>> the
> > > >>>> >>>>>>>>>>>>> Tempura paper for more details. We are also working
> > on a
> > > >>>> >>> journal
> > > >>>> >>>>>>>>>> version
> > > >>>> >>>>>>>>>>>> of
> > > >>>> >>>>>>>>>>>>> the paper with more implementation details.
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> Currently the query plan generated by Calcite is meant
> > > to
> > > >>>> be
> > > >>>> >>>>>>>> executed
> > > >>>> >>>>>>>>>>>>> altogether at once. In the proposal, Calcite’s memo
> > will
> > > >>>> be
> > > >>>> >>>>>>> extended
> > > >>>> >>>>>>>>>> with
> > > >>>> >>>>>>>>>>>>> temporal information so that it is capable of
> > generating
> > > >>>> >>>>>>> incremental
> > > >>>> >>>>>>>>>>>> plans
> > > >>>> >>>>>>>>>>>>> that include multiple sub-plans to execute at
> > different
> > > >>>> time
> > > >>>> >>>>>>> points.
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> The main idea is to view each table as one that
> > changes
> > > >>>> over
> > > >>>> >>> time
> > > >>>> >>>>>>>>>> (Time
> > > >>>> >>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we
> > introduced
> > > >>>> >>>>>>> TvrMetaSet
> > > >>>> >>>>>>>>>> into
> > > >>>> >>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to track
> > > >>>> related
> > > >>>> >>>>>>> RelSets
> > > >>>> >>>>>>>>>> of a
> > > >>>> >>>>>>>>>>>>> changing table (e.g. snapshot of the table at certain
> > > >>>> time,
> > > >>>> >>>>>>> delta of
> > > >>>> >>>>>>>>>> the
> > > >>>> >>>>>>>>>>>>> table between two time points, etc.).
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> [image: image.png]
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> For example in the above figure, each vertical line
> > is a
> > > >>>> >>>>>>> TvrMetaSet
> > > >>>> >>>>>>>>>>>>> representing a TVR (S, R, S left outer join R, etc.).
> > > >>>> >>> Horizontal
> > > >>>> >>>>>>>> lines
> > > >>>> >>>>>>>>>>>>> represent time. Each black dot in the grid is a
> > RelSet.
> > > >>>> Users
> > > >>>> >>> can
> > > >>>> >>>>>>>>>> write
> > > >>>> >>>>>>>>>>>> TVR
> > > >>>> >>>>>>>>>>>>> Rewrite Rules to describe valid transformations
> > between
> > > >>>> these
> > > >>>> >>>>>>> dots.
> > > >>>> >>>>>>>>>> For
> > > >>>> >>>>>>>>>>>>> example, the blues lines are inter-TVR rules that
> > > >>>> describe how
> > > >>>> >>> to
> > > >>>> >>>>>>>>>> compute
> > > >>>> >>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other TVRs.
> > The
> > > >>>> red
> > > >>>> >>> lines
> > > >>>> >>>>>>>> are
> > > >>>> >>>>>>>>>>>>> intra-TVR rules that describe transformations within a
> > > >>>> TVR. All
> > > >>>> >>>>>>> TVR
> > > >>>> >>>>>>>>>>>> rewrite
> > > >>>> >>>>>>>>>>>>> rules are logical rules. All existing Calcite rules
> > > still
> > > >>>> work
> > > >>>> >>> in
> > > >>>> >>>>>>>> the
> > > >>>> >>>>>>>>>> new
> > > >>>> >>>>>>>>>>>>> volcano system without modification.
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> All changes in this feature will consist of four
> > parts:
> > > >>>> >>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
> > > >>>> >>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching TvrMetaSet
> > > and
> > > >>>> >>>>>>> RelNodes,
> > > >>>> >>>>>>>>>> as
> > > >>>> >>>>>>>>>>>>> well as links in between the nodes.
> > > >>>> >>>>>>>>>>>>> 3. A basic set of TvrRules, written using the upgraded
> > > >>>> rule
> > > >>>> >>>>>>> engine
> > > >>>> >>>>>>>>>> API.
> > > >>>> >>>>>>>>>>>>> 4. Multi-query optimization, used to find the best
> > > >>>> incremental
> > > >>>> >>>>>>> plan
> > > >>>> >>>>>>>>>>>>> involving multiple time points.
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> Note that this feature is an extension in nature and
> > > thus
> > > >>>> when
> > > >>>> >>>>>>>>>> disabled,
> > > >>>> >>>>>>>>>>>>> does not change any existing Calcite behavior.
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> Other than scenarios in the paper, we also applied
> > this
> > > >>>> >>>>>>>>>> Calcite-extended
> > > >>>> >>>>>>>>>>>>> incremental query optimizer to a type of periodic
> > query
> > > >>>> called
> > > >>>> >>>>>>> the
> > > >>>> >>>>>>>>>>>> ‘‘range
> > > >>>> >>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It achieved cost
> > > >>>> savings
> > > >>>> >>> of
> > > >>>> >>>>>>> 80%
> > > >>>> >>>>>>>>>> on
> > > >>>> >>>>>>>>>>>>> total CPU and memory consumption, and 60% on
> > end-to-end
> > > >>>> >>> execution
> > > >>>> >>>>>>>>>> time.
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> All comments and suggestions are welcome. Thanks and
> > > happy
> > > >>>> >>>>>>> holidays!
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> Best,
> > > >>>> >>>>>>>>>>>>> Botong
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>
> > > >>>> >>>>>>>
> > > >>>> >>>>>>>
> > > >>>> >>>>>>> --
> > > >>>> >>>>>>> ~~~~~~~~~~~~~~~
> > > >>>> >>>>>>> no mistakes
> > > >>>> >>>>>>> ~~~~~~~~~~~~~~~~~~
> > > >>>> >>>>>>>
> > > >>>> >>>>>>
> > > >>>> >>>
> > > >>>> >>
> > > >>>>
> > > >>>>
> > >
> >
> >
> > --
> > Viliam Durina
> > Jet Developer
> >       hazelcast®
> >
> >   <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA 94402 |
> > USA
> > +1 (650) 521-5453 | hazelcast.com <https://www.hazelcast.com>
> >
> > --
> > This message contains confidential information and is intended only for
> > the
> > individuals named. If you are not the named addressee you should not
> > disseminate, distribute or copy this e-mail. Please notify the sender
> > immediately by e-mail if you have received this e-mail by mistake and
> > delete this e-mail from your system. E-mail transmission cannot be
> > guaranteed to be secure or error-free as information could be intercepted,
> > corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
> > The sender therefore does not accept liability for any errors or omissions
> > in the contents of this message, which arise as a result of e-mail
> > transmission. If verification is required, please request a hard-copy
> > version. -Hazelcast
> >

Reply via email to