We didn't record it, we will try to record the following meetings. Please
add your time preference in the docs, so that we can find a meeting time
that works for more people.

Thanks,
Botong

On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina <[email protected]> wrote:

> Is there a recording available?
> Viliam
>
> On Wed, 28 Apr 2021 at 00:15, Botong Huang <[email protected]> wrote:
>
> > Hi all,
> >
> > The meeting yesterday was fun and productive. As discussed, this is the
> > call to schedule our second meeting.
> >
> > We encourage everyone to add their time preferences during 05/01 - 05/15
> > here:
> >
> >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >
> > Thanks,
> > Botong
> >
> > On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <[email protected]> wrote:
> >
> > > Hi all,
> > > We've created a zoom meeting below for our meeting next Monday
> > > (9pm-10:30pm PST on 04/26).
> > > Talk to you all soon!
> > >
> > > Join Zoom Meeting
> > > https://uci.zoom.us/j/91279732686
> > > <
> >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE
> > >
> > >
> > > Meeting ID: 912 7973 2686
> > > One tap mobile
> > > +16699006833,,91279732686# US (San Jose)
> > > +12532158782,,91279732686# US (Tacoma)
> > >
> > > Dial by your location
> > > +1 669 900 6833 US (San Jose)
> > > +1 253 215 8782 US (Tacoma)
> > > +1 346 248 7799 US (Houston)
> > > +1 301 715 8592 US (Washington DC)
> > > +1 312 626 6799 US (Chicago)
> > > +1 646 558 8656 US (New York)
> > > Meeting ID: 912 7973 2686
> > > Find your local number: https://uci.zoom.us/u/aykHTkJBh
> > > <
> >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM
> > >
> > >
> > > Join by Skype for Business
> > > https://uci.zoom.us/skype/91279732686
> > > <
> >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy
> > >
> > >
> > >
> > > Thanks,
> > > Botong
> > >
> > > On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <[email protected]>
> wrote:
> > >
> > >> Hi all,
> > >>
> > >> According to the preferences collected, we are tentatively scheduling
> > our
> > >> meeting at 9pm-10:30pm PST on 04/26 Monday.
> > >>
> > >> We will give a presentation about Tempura, followed by a free
> > discussion.
> > >>
> > >> Please let us know if there are new other requests. Few days before
> > >> the meeting, I will send out a zoom meeting link.
> > >>
> > >> Thanks,
> > >> Botong
> > >>
> > >> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <[email protected]> wrote:
> > >>
> > >>> Hi Julian and all,
> > >>>
> > >>> We've posted the Tempura code base below. Feel free to take a quick
> > peek
> > >>> at the last five commits.
> > >>>
> > https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
> > >>>
> > >>> I've also opened a Jira (CALCITE-4568
> > >>> <https://issues.apache.org/jira/browse/CALCITE-4568>), which will
> > serve
> > >>> as the umbrella Jira for the feature.
> > >>>
> > >>> In the meantime, we encourage everyone to enter the time preferences
> > for
> > >>> our first meeting here:
> > >>>
> > >>>
> >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > >>>
> > >>> Thanks,
> > >>> Botong
> > >>>
> > >>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <[email protected]>
> > >>> wrote:
> > >>>
> > >>>> I have added my time preferences to the doc.
> > >>>>
> > >>>> Before we meet, could you publish a PR for us to review?
> > >>>>
> > >>>> Initial discussions will need to be about architecture and
> high-level
> > >>>> design. So I would ask Calcite reviewers not to review the PR
> > line-by-line
> > >>>> (or to leave comments in GitHub) but try to understand the design
> > >>>> holistically, and prepare questions/comments before the meeting.
> > >>>>
> > >>>> Botong, Can you please create a Calcite JIRA case for this task?
> JIRA
> > >>>> how we track long-running tasks such as this.
> > >>>>
> > >>>> Julian
> > >>>>
> > >>>>
> > >>>> > On Apr 3, 2021, at 5:15 PM, Botong Huang <[email protected]>
> wrote:
> > >>>> >
> > >>>> > Hi all,
> > >>>> >
> > >>>> > Apology for the delay. It took us some time to clean up our code
> > base
> > >>>> and
> > >>>> > publicly release it (which will be out soon) for a quick peek.
> > >>>> >
> > >>>> > We are ready to present our work. Let's schedule a time for a Zoom
> > >>>> > meeting and discuss how to integrate Tempura into Calcite.
> > >>>> >
> > >>>> > Since some of our team members are in China, we prefer the time
> slot
> > >>>> of
> > >>>> > 7:00pm-11:30pm PST any day. I've added our time preference in the
> > >>>> shared
> > >>>> > doc below.
> > >>>> >
> > >>>>
> >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > >>>> >
> > >>>> > We encourage everyone to add their time preferences (during
> > >>>> 04/15-04/30) in
> > >>>> > this doc. In a week or so, we will try to settle a time that works
> > for
> > >>>> > most.
> > >>>> >
> > >>>> > Thanks,
> > >>>> > Botong
> > >>>> >
> > >>>> > On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <[email protected]>
> > >>>> wrote:
> > >>>> >
> > >>>> >> Hi Julian and Rui,
> > >>>> >>
> > >>>> >> Sounds good to us. Please give us some time to prepare some
> slides
> > >>>> for the
> > >>>> >> meeting.
> > >>>> >>
> > >>>> >> I've created a doc below for discussion. Please feel free to add
> > >>>> more in
> > >>>> >> here:
> > >>>> >>
> > >>>> >>
> > >>>>
> >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > >>>> >>
> > >>>> >> Thanks,
> > >>>> >> Botong
> > >>>> >>
> > >>>> >> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <
> > [email protected]
> > >>>> >
> > >>>> >> wrote:
> > >>>> >>
> > >>>> >>> PS The “editable doc” that Rui refers to is also a good idea. I
> > >>>> think we
> > >>>> >>> should create it to continue discussion after the first meeting.
> > >>>> >>>
> > >>>> >>> Julian
> > >>>> >>>
> > >>>> >>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <
> > [email protected]>
> > >>>> >>> wrote:
> > >>>> >>>>
> > >>>> >>>> I think good next steps would be a PR and a meeting. The PR
> will
> > >>>> allow
> > >>>> >>> us to read the code, but I think we should do the first round of
> > >>>> questions
> > >>>> >>> at the meeting.  The meeting could perhaps start with a
> > >>>> presentation of the
> > >>>> >>> paper (do you have some slides you are planning to present at
> > VLDB,
> > >>>> >>> Botong?) and then move on to questions about the concepts, which
> > >>>> >>> alternatives were considered, and how the concepts map onto
> other
> > >>>> current
> > >>>> >>> and future concepts in calcite.
> > >>>> >>>>
> > >>>> >>>> I don’t think we should start “reviewing” the PR line-by-line
> at
> > >>>> this
> > >>>> >>> point. We need to understand the high-level concepts and design
> > >>>> choices. If
> > >>>> >>> we start reviewing the PR we will get lost in the details.
> > >>>> >>>>
> > >>>> >>>> I know that integrating a major change is hard; I doubt that we
> > >>>> will be
> > >>>> >>> able to integrate everything, but we can build understanding
> about
> > >>>> where
> > >>>> >>> calcite needs to go, and I hope integrate a good amount of code
> to
> > >>>> help us
> > >>>> >>> get there.
> > >>>> >>>>
> > >>>> >>>> As I said before, after the integration I would like people to
> be
> > >>>> able
> > >>>> >>> to experiment with it and use it in their production systems.
> > That
> > >>>> way, it
> > >>>> >>> will not be an experiment that withers, but a feature set
> > >>>> integrates with
> > >>>> >>> other calcite features and gets stronger over time.
> > >>>> >>>>
> > >>>> >>>> Julian
> > >>>> >>>>
> > >>>> >>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <[email protected]>
> > >>>> wrote:
> > >>>> >>>>>
> > >>>> >>>>> For me to participate in the discussion for the above
> > questions,
> > >>>> I
> > >>>> >>> will
> > >>>> >>>>> need to read a lot more to know relevant context and likely
> ask
> > >>>> lots of
> > >>>> >>>>> questions :-).  A editable doc is probably good for questions
> > and
> > >>>> back
> > >>>> >>> and
> > >>>> >>>>> forward discussion.
> > >>>> >>>>>
> > >>>> >>>>>
> > >>>> >>>>> -Rui
> > >>>> >>>>>
> > >>>> >>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <
> > [email protected]
> > >>>> >
> > >>>> >>> wrote:
> > >>>> >>>>>>
> > >>>> >>>>>> I am also happy to help push this work into Calcite (review
> > code
> > >>>> and
> > >>>> >>> doc,
> > >>>> >>>>>> etc.).
> > >>>> >>>>>>
> > >>>> >>>>>> While you can share your code so people can have more idea
> how
> > >>>> it is
> > >>>> >>>>>> implemented, I think it would be also nice to have a doc to
> > >>>> discuss
> > >>>> >>> open
> > >>>> >>>>>> questions above. Some points that I copy those to here:
> > >>>> >>>>>>
> > >>>> >>>>>> 1. Can this solution be compatible with existing solutions in
> > >>>> Calcite
> > >>>> >>>>>> Streaming, materialized view maintenance, and multi-query
> > >>>> optimization
> > >>>> >>>>>> (Sigma and Delta relational operators, lattice, and Spool
> > >>>> operator),
> > >>>> >>>>>> 2. Did you find that you needed two separate cost models -
> one
> > >>>> for
> > >>>> >>> “view
> > >>>> >>>>>> maintenance” and another for “user queries” - since the
> > >>>> objectives of
> > >>>> >>> each
> > >>>> >>>>>> activity are so different?
> > >>>> >>>>>> 3. whether this work will hasten the arrival of
> multi-objective
> > >>>> >>> parametric
> > >>>> >>>>>> query optimization [1] in Calcite.
> > >>>> >>>>>> 4. probably SQL shell support.
> > >>>> >>>>>>
> > >>>> >>>>>>
> > >>>> >>>>>> [1]:
> > >>>> >>>>>>
> > >>>> >>>
> > >>>>
> >
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> > >>>> >>>>>>
> > >>>> >>>>>>
> > >>>> >>>>>> -Rui
> > >>>> >>>>>>
> > >>>> >>>>>>
> > >>>> >>>>>>
> > >>>> >>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <[email protected]>
> > >>>> wrote:
> > >>>> >>>>>>>
> > >>>> >>>>>>> it would be very nice to see a POC of your work.
> > >>>> >>>>>>>
> > >>>> >>>>>>>
> > >>>> >>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
> > >>>> [email protected]>
> > >>>> >>> wrote:
> > >>>> >>>>>>>
> > >>>> >>>>>>>> Hi Julian,
> > >>>> >>>>>>>>
> > >>>> >>>>>>>> Just wondering if there are any updates? We are wondering
> if
> > it
> > >>>> >>> would
> > >>>> >>>>>>> help
> > >>>> >>>>>>>> to post our code for a quick preview.
> > >>>> >>>>>>>>
> > >>>> >>>>>>>> Thanks,
> > >>>> >>>>>>>> Botong
> > >>>> >>>>>>>>
> > >>>> >>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <
> > [email protected]
> > >>>> >
> > >>>> >>> wrote:
> > >>>> >>>>>>>>
> > >>>> >>>>>>>>> Hi Julian,
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> Thanks for your interest! Sure let's figure out a plan
> that
> > >>>> best
> > >>>> >>>>>>> benefits
> > >>>> >>>>>>>>> the community. Here are some clarifications that hopefully
> > >>>> answer
> > >>>> >>> your
> > >>>> >>>>>>>>> questions.
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> In our work (Tempura), users specify the set of time
> points
> > to
> > >>>> >>>>>>> consider
> > >>>> >>>>>>>>> running and a cost function that expresses users'
> preference
> > >>>> over
> > >>>> >>>>>>> time,
> > >>>> >>>>>>>>> Tempura will generate the best incremental plan that
> > >>>> minimizes the
> > >>>> >>>>>>>> overall
> > >>>> >>>>>>>>> cost function.
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> In this incremental plan, the sub-plans at different time
> > >>>> points
> > >>>> >>> can
> > >>>> >>>>>>> be
> > >>>> >>>>>>>>> different from each other, as opposed to identical plans
> in
> > >>>> all
> > >>>> >>> delta
> > >>>> >>>>>>>> runs
> > >>>> >>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of the
> Tempura
> > >>>> paper,
> > >>>> >>> we
> > >>>> >>>>>>> can
> > >>>> >>>>>>>>> mimic the current streaming implementation by specifying
> two
> > >>>> >>> (logical)
> > >>>> >>>>>>>> time
> > >>>> >>>>>>>>> points in Tempura, representing the initial run and later
> > >>>> delta
> > >>>> >>> runs
> > >>>> >>>>>>>>> respectively. In general, note that Tempura supports
> various
> > >>>> form
> > >>>> >>> of
> > >>>> >>>>>>>>> incremental computing, not only the small-delta
> append-only
> > >>>> data
> > >>>> >>>>>>> model in
> > >>>> >>>>>>>>> streaming systems. That's why we believe Tempura subsumes
> > the
> > >>>> >>> current
> > >>>> >>>>>>>>> streaming support, as well as any IVM implementations.
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> About the cost model, we did not come up with a seperate
> > cost
> > >>>> >>> model,
> > >>>> >>>>>>> but
> > >>>> >>>>>>>>> rather extended the existing one. Similar to
> multi-objective
> > >>>> >>>>>>>> optimization,
> > >>>> >>>>>>>>> costs incurred at different time points are considered
> > >>>> different
> > >>>> >>>>>>>>> dimensions. Tempura lets users supply a function that
> > >>>> converts this
> > >>>> >>>>>>> cost
> > >>>> >>>>>>>>> vector into a final cost. So under this function, any two
> > >>>> >>> incremental
> > >>>> >>>>>>>> plans
> > >>>> >>>>>>>>> are still comparable and there is an overall optimum. I
> > guess
> > >>>> we
> > >>>> >>> can
> > >>>> >>>>>>> go
> > >>>> >>>>>>>>> down the route of multi-objective parametric query
> > >>>> optimization
> > >>>> >>>>>>> instead
> > >>>> >>>>>>>> if
> > >>>> >>>>>>>>> there is a need.
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> Next on materialized views and multi-query optimization,
> > >>>> since our
> > >>>> >>>>>>>>> multi-time-point plan naturally involves materializing
> > >>>> intermediate
> > >>>> >>>>>>>> results
> > >>>> >>>>>>>>> for later time points, we need to solve the problem of
> > >>>> choosing
> > >>>> >>>>>>>>> materializations and include the cost of saving and
> reusing
> > >>>> the
> > >>>> >>>>>>>>> materializations when costing and comparing plans. We
> > >>>> borrowed the
> > >>>> >>>>>>>>> multi-query optimization techniques to solve this problem
> > even
> > >>>> >>> though
> > >>>> >>>>>>> we
> > >>>> >>>>>>>>> are looking at a single query. As a result, we think our
> > work
> > >>>> is
> > >>>> >>>>>>>> orthogonal
> > >>>> >>>>>>>>> to Calcite's facilities around utilizing existing views,
> > >>>> lattice
> > >>>> >>> etc.
> > >>>> >>>>>>> We
> > >>>> >>>>>>>> do
> > >>>> >>>>>>>>> feel that the multi-query optimization component can be
> > >>>> adopted to
> > >>>> >>>>>>> wider
> > >>>> >>>>>>>>> use, but probably need more suggestions from the
> community.
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> Lastly, our current implementation is set up in java code,
> > it
> > >>>> >>> should
> > >>>> >>>>>>> be
> > >>>> >>>>>>>>> straightforward to hook it up with SQL shell.
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> Thanks,
> > >>>> >>>>>>>>> Botong
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
> > >>>> >>> [email protected]>
> > >>>> >>>>>>>>> wrote:
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>>> Botong,
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>> This is very exciting; congratulations on this research,
> > and
> > >>>> thank
> > >>>> >>>>>>> you
> > >>>> >>>>>>>>>> for contributing it back to Calcite.
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>> The research touches several areas in Calcite: streaming,
> > >>>> >>>>>>> materialized
> > >>>> >>>>>>>>>> view maintenance, and multi-query optimization. As we
> have
> > >>>> already
> > >>>> >>>>>>> some
> > >>>> >>>>>>>>>> solutions in those areas (Sigma and Delta relational
> > >>>> operators,
> > >>>> >>>>>>> lattice,
> > >>>> >>>>>>>>>> and Spool operator), it will be interesting to see
> whether
> > >>>> we can
> > >>>> >>>>>>> make
> > >>>> >>>>>>>> them
> > >>>> >>>>>>>>>> compatible, or whether one concept can subsume others.
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>> Your work differs from streaming queries in that your
> > >>>> relations
> > >>>> >>> are
> > >>>> >>>>>>> used
> > >>>> >>>>>>>>>> by “external” user queries, whereas in pure streaming
> > >>>> queries, the
> > >>>> >>>>>>> only
> > >>>> >>>>>>>>>> activity is the change propagation. Did you find that you
> > >>>> needed
> > >>>> >>> two
> > >>>> >>>>>>>>>> separate cost models - one for “view maintenance” and
> > >>>> another for
> > >>>> >>>>>>> “user
> > >>>> >>>>>>>>>> queries” - since the objectives of each activity are so
> > >>>> different?
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>> I wonder whether this work will hasten the arrival of
> > >>>> >>> multi-objective
> > >>>> >>>>>>>>>> parametric query optimization [1] in Calcite.
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>> I will make time over the next few days to read and
> digest
> > >>>> your
> > >>>> >>>>>>> paper.
> > >>>> >>>>>>>>>> Then I expect that we will have a back-and-forth process
> to
> > >>>> create
> > >>>> >>>>>>>>>> something that will be useful for the broader community.
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>> One thing will be particularly useful: making this
> > >>>> functionality
> > >>>> >>>>>>>>>> available from a SQL shell, so that people can experiment
> > >>>> with
> > >>>> >>> this
> > >>>> >>>>>>>>>> functionality without writing Java code or setting up
> > complex
> > >>>> >>>>>>> databases
> > >>>> >>>>>>>> and
> > >>>> >>>>>>>>>> metadata. I have in mind something like the simple DDL
> > >>>> operations
> > >>>> >>>>>>> that
> > >>>> >>>>>>>> are
> > >>>> >>>>>>>>>> available in Calcite’s ’server’ module. I wonder whether
> we
> > >>>> could
> > >>>> >>>>>>> devise
> > >>>> >>>>>>>>>> some kind of SQL syntax for a “multi-query”.
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>> Julian
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>> [1]
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>
> > >>>> >>>>>>>
> > >>>> >>>
> > >>>>
> >
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <
> > [email protected]
> > >>>> >
> > >>>> >>>>>>> wrote:
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>>> Thanks Aron for pointing this out. To see the figure,
> > please
> > >>>> >>> refer
> > >>>> >>>>>>> to
> > >>>> >>>>>>>>>> Fig
> > >>>> >>>>>>>>>>> 3(a) in our paper:
> > >>>> >>>>>>>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>>> Best,
> > >>>> >>>>>>>>>>> Botong
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
> > >>>> [email protected]>
> > >>>> >>>>>>>> wrote:
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>>>> Seems interesting, the pic can not be seen in the mail,
> > >>>> may you
> > >>>> >>>>>>> open
> > >>>> >>>>>>>> a
> > >>>> >>>>>>>>>> JIRA
> > >>>> >>>>>>>>>>>> for this, people who are interested in this can
> subscribe
> > >>>> to the
> > >>>> >>>>>>>> JIRA?
> > >>>> >>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>
> > >>>> >>>>>>>>>>>> Regards!
> > >>>> >>>>>>>>>>>>
> > >>>> >>>>>>>>>>>> Aron Tao
> > >>>> >>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>
> > >>>> >>>>>>>>>>>> Botong Huang <[email protected]> 于2020年12月24日周四
> > 上午3:18写道:
> > >>>> >>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> Hi all,
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> This is a proposal to extend the Calcite optimizer
> into
> > a
> > >>>> >>> general
> > >>>> >>>>>>>>>>>>> incremental query optimizer, based on our research
> paper
> > >>>> >>>>>>> published
> > >>>> >>>>>>>> in
> > >>>> >>>>>>>>>>>> VLDB
> > >>>> >>>>>>>>>>>>> 2021:
> > >>>> >>>>>>>>>>>>> Tempura: a general cost-based optimizer framework for
> > >>>> >>> incremental
> > >>>> >>>>>>>> data
> > >>>> >>>>>>>>>>>>> processing
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating how
> > >>>> Alibaba’s
> > >>>> >>>>>>> data
> > >>>> >>>>>>>>>>>>> warehouse is planning to use this incremental query
> > >>>> optimizer
> > >>>> >>> to
> > >>>> >>>>>>>>>>>> alleviate
> > >>>> >>>>>>>>>>>>> cluster-wise resource skewness:
> > >>>> >>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting Resource-Aware
> > >>>> >>> Incremental
> > >>>> >>>>>>>>>>>> Computing
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> To our best knowledge, this is the first general
> > >>>> cost-based
> > >>>> >>>>>>>>>> incremental
> > >>>> >>>>>>>>>>>>> optimizer that can find the best plan across multiple
> > >>>> families
> > >>>> >>> of
> > >>>> >>>>>>>>>>>>> incremental computing methods, including IVM,
> Streaming,
> > >>>> >>>>>>> DBToaster,
> > >>>> >>>>>>>>>> etc.
> > >>>> >>>>>>>>>>>>> Experiments (in the paper) shows that the generated
> best
> > >>>> plan
> > >>>> >>> is
> > >>>> >>>>>>>>>>>>> consistently much better than the plans from each
> > >>>> individual
> > >>>> >>>>>>> method
> > >>>> >>>>>>>>>>>> alone.
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> In general, incremental query planning is central to
> > >>>> database
> > >>>> >>>>>>> view
> > >>>> >>>>>>>>>>>>> maintenance and stream processing systems, and are
> being
> > >>>> >>> adopted
> > >>>> >>>>>>> in
> > >>>> >>>>>>>>>>>> active
> > >>>> >>>>>>>>>>>>> databases, resumable query execution, approximate
> query
> > >>>> >>>>>>> processing,
> > >>>> >>>>>>>>>> etc.
> > >>>> >>>>>>>>>>>> We
> > >>>> >>>>>>>>>>>>> are hoping that this feature can help widening the
> > >>>> spectrum of
> > >>>> >>>>>>>>>> Calcite,
> > >>>> >>>>>>>>>>>>> solicit more use cases and adoption of Calcite.
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> Below is a brief description of the technical details.
> > >>>> Please
> > >>>> >>>>>>> refer
> > >>>> >>>>>>>> to
> > >>>> >>>>>>>>>>>> the
> > >>>> >>>>>>>>>>>>> Tempura paper for more details. We are also working
> on a
> > >>>> >>> journal
> > >>>> >>>>>>>>>> version
> > >>>> >>>>>>>>>>>> of
> > >>>> >>>>>>>>>>>>> the paper with more implementation details.
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> Currently the query plan generated by Calcite is meant
> > to
> > >>>> be
> > >>>> >>>>>>>> executed
> > >>>> >>>>>>>>>>>>> altogether at once. In the proposal, Calcite’s memo
> will
> > >>>> be
> > >>>> >>>>>>> extended
> > >>>> >>>>>>>>>> with
> > >>>> >>>>>>>>>>>>> temporal information so that it is capable of
> generating
> > >>>> >>>>>>> incremental
> > >>>> >>>>>>>>>>>> plans
> > >>>> >>>>>>>>>>>>> that include multiple sub-plans to execute at
> different
> > >>>> time
> > >>>> >>>>>>> points.
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> The main idea is to view each table as one that
> changes
> > >>>> over
> > >>>> >>> time
> > >>>> >>>>>>>>>> (Time
> > >>>> >>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we
> introduced
> > >>>> >>>>>>> TvrMetaSet
> > >>>> >>>>>>>>>> into
> > >>>> >>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to track
> > >>>> related
> > >>>> >>>>>>> RelSets
> > >>>> >>>>>>>>>> of a
> > >>>> >>>>>>>>>>>>> changing table (e.g. snapshot of the table at certain
> > >>>> time,
> > >>>> >>>>>>> delta of
> > >>>> >>>>>>>>>> the
> > >>>> >>>>>>>>>>>>> table between two time points, etc.).
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> [image: image.png]
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> For example in the above figure, each vertical line
> is a
> > >>>> >>>>>>> TvrMetaSet
> > >>>> >>>>>>>>>>>>> representing a TVR (S, R, S left outer join R, etc.).
> > >>>> >>> Horizontal
> > >>>> >>>>>>>> lines
> > >>>> >>>>>>>>>>>>> represent time. Each black dot in the grid is a
> RelSet.
> > >>>> Users
> > >>>> >>> can
> > >>>> >>>>>>>>>> write
> > >>>> >>>>>>>>>>>> TVR
> > >>>> >>>>>>>>>>>>> Rewrite Rules to describe valid transformations
> between
> > >>>> these
> > >>>> >>>>>>> dots.
> > >>>> >>>>>>>>>> For
> > >>>> >>>>>>>>>>>>> example, the blues lines are inter-TVR rules that
> > >>>> describe how
> > >>>> >>> to
> > >>>> >>>>>>>>>> compute
> > >>>> >>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other TVRs.
> The
> > >>>> red
> > >>>> >>> lines
> > >>>> >>>>>>>> are
> > >>>> >>>>>>>>>>>>> intra-TVR rules that describe transformations within a
> > >>>> TVR. All
> > >>>> >>>>>>> TVR
> > >>>> >>>>>>>>>>>> rewrite
> > >>>> >>>>>>>>>>>>> rules are logical rules. All existing Calcite rules
> > still
> > >>>> work
> > >>>> >>> in
> > >>>> >>>>>>>> the
> > >>>> >>>>>>>>>> new
> > >>>> >>>>>>>>>>>>> volcano system without modification.
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> All changes in this feature will consist of four
> parts:
> > >>>> >>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
> > >>>> >>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching TvrMetaSet
> > and
> > >>>> >>>>>>> RelNodes,
> > >>>> >>>>>>>>>> as
> > >>>> >>>>>>>>>>>>> well as links in between the nodes.
> > >>>> >>>>>>>>>>>>> 3. A basic set of TvrRules, written using the upgraded
> > >>>> rule
> > >>>> >>>>>>> engine
> > >>>> >>>>>>>>>> API.
> > >>>> >>>>>>>>>>>>> 4. Multi-query optimization, used to find the best
> > >>>> incremental
> > >>>> >>>>>>> plan
> > >>>> >>>>>>>>>>>>> involving multiple time points.
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> Note that this feature is an extension in nature and
> > thus
> > >>>> when
> > >>>> >>>>>>>>>> disabled,
> > >>>> >>>>>>>>>>>>> does not change any existing Calcite behavior.
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> Other than scenarios in the paper, we also applied
> this
> > >>>> >>>>>>>>>> Calcite-extended
> > >>>> >>>>>>>>>>>>> incremental query optimizer to a type of periodic
> query
> > >>>> called
> > >>>> >>>>>>> the
> > >>>> >>>>>>>>>>>> ‘‘range
> > >>>> >>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It achieved cost
> > >>>> savings
> > >>>> >>> of
> > >>>> >>>>>>> 80%
> > >>>> >>>>>>>>>> on
> > >>>> >>>>>>>>>>>>> total CPU and memory consumption, and 60% on
> end-to-end
> > >>>> >>> execution
> > >>>> >>>>>>>>>> time.
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> All comments and suggestions are welcome. Thanks and
> > happy
> > >>>> >>>>>>> holidays!
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> Best,
> > >>>> >>>>>>>>>>>>> Botong
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>
> > >>>> >>>>>>>
> > >>>> >>>>>>>
> > >>>> >>>>>>> --
> > >>>> >>>>>>> ~~~~~~~~~~~~~~~
> > >>>> >>>>>>> no mistakes
> > >>>> >>>>>>> ~~~~~~~~~~~~~~~~~~
> > >>>> >>>>>>>
> > >>>> >>>>>>
> > >>>> >>>
> > >>>> >>
> > >>>>
> > >>>>
> >
>
>
> --
> Viliam Durina
> Jet Developer
>       hazelcast®
>
>   <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA 94402 |
> USA
> +1 (650) 521-5453 | hazelcast.com <https://www.hazelcast.com>
>
> --
> This message contains confidential information and is intended only for
> the
> individuals named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. E-mail transmission cannot be
> guaranteed to be secure or error-free as information could be intercepted,
> corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
> The sender therefore does not accept liability for any errors or omissions
> in the contents of this message, which arise as a result of e-mail
> transmission. If verification is required, please request a hard-copy
> version. -Hazelcast
>

Reply via email to