Is there a recording available?
Viliam

On Wed, 28 Apr 2021 at 00:15, Botong Huang <pku...@gmail.com> wrote:

> Hi all,
>
> The meeting yesterday was fun and productive. As discussed, this is the
> call to schedule our second meeting.
>
> We encourage everyone to add their time preferences during 05/01 - 05/15
> here:
>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>
> Thanks,
> Botong
>
> On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <pku...@gmail.com> wrote:
>
> > Hi all,
> > We've created a zoom meeting below for our meeting next Monday
> > (9pm-10:30pm PST on 04/26).
> > Talk to you all soon!
> >
> > Join Zoom Meeting
> > https://uci.zoom.us/j/91279732686
> > <
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE
> >
> >
> > Meeting ID: 912 7973 2686
> > One tap mobile
> > +16699006833,,91279732686# US (San Jose)
> > +12532158782,,91279732686# US (Tacoma)
> >
> > Dial by your location
> > +1 669 900 6833 US (San Jose)
> > +1 253 215 8782 US (Tacoma)
> > +1 346 248 7799 US (Houston)
> > +1 301 715 8592 US (Washington DC)
> > +1 312 626 6799 US (Chicago)
> > +1 646 558 8656 US (New York)
> > Meeting ID: 912 7973 2686
> > Find your local number: https://uci.zoom.us/u/aykHTkJBh
> > <
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM
> >
> >
> > Join by Skype for Business
> > https://uci.zoom.us/skype/91279732686
> > <
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy
> >
> >
> >
> > Thanks,
> > Botong
> >
> > On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <pku...@gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> According to the preferences collected, we are tentatively scheduling
> our
> >> meeting at 9pm-10:30pm PST on 04/26 Monday.
> >>
> >> We will give a presentation about Tempura, followed by a free
> discussion.
> >>
> >> Please let us know if there are new other requests. Few days before
> >> the meeting, I will send out a zoom meeting link.
> >>
> >> Thanks,
> >> Botong
> >>
> >> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <pku...@gmail.com> wrote:
> >>
> >>> Hi Julian and all,
> >>>
> >>> We've posted the Tempura code base below. Feel free to take a quick
> peek
> >>> at the last five commits.
> >>>
> https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
> >>>
> >>> I've also opened a Jira (CALCITE-4568
> >>> <https://issues.apache.org/jira/browse/CALCITE-4568>), which will
> serve
> >>> as the umbrella Jira for the feature.
> >>>
> >>> In the meantime, we encourage everyone to enter the time preferences
> for
> >>> our first meeting here:
> >>>
> >>>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >>>
> >>> Thanks,
> >>> Botong
> >>>
> >>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <jhyde.apa...@gmail.com>
> >>> wrote:
> >>>
> >>>> I have added my time preferences to the doc.
> >>>>
> >>>> Before we meet, could you publish a PR for us to review?
> >>>>
> >>>> Initial discussions will need to be about architecture and high-level
> >>>> design. So I would ask Calcite reviewers not to review the PR
> line-by-line
> >>>> (or to leave comments in GitHub) but try to understand the design
> >>>> holistically, and prepare questions/comments before the meeting.
> >>>>
> >>>> Botong, Can you please create a Calcite JIRA case for this task? JIRA
> >>>> how we track long-running tasks such as this.
> >>>>
> >>>> Julian
> >>>>
> >>>>
> >>>> > On Apr 3, 2021, at 5:15 PM, Botong Huang <pku...@gmail.com> wrote:
> >>>> >
> >>>> > Hi all,
> >>>> >
> >>>> > Apology for the delay. It took us some time to clean up our code
> base
> >>>> and
> >>>> > publicly release it (which will be out soon) for a quick peek.
> >>>> >
> >>>> > We are ready to present our work. Let's schedule a time for a Zoom
> >>>> > meeting and discuss how to integrate Tempura into Calcite.
> >>>> >
> >>>> > Since some of our team members are in China, we prefer the time slot
> >>>> of
> >>>> > 7:00pm-11:30pm PST any day. I've added our time preference in the
> >>>> shared
> >>>> > doc below.
> >>>> >
> >>>>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >>>> >
> >>>> > We encourage everyone to add their time preferences (during
> >>>> 04/15-04/30) in
> >>>> > this doc. In a week or so, we will try to settle a time that works
> for
> >>>> > most.
> >>>> >
> >>>> > Thanks,
> >>>> > Botong
> >>>> >
> >>>> > On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <pku...@gmail.com>
> >>>> wrote:
> >>>> >
> >>>> >> Hi Julian and Rui,
> >>>> >>
> >>>> >> Sounds good to us. Please give us some time to prepare some slides
> >>>> for the
> >>>> >> meeting.
> >>>> >>
> >>>> >> I've created a doc below for discussion. Please feel free to add
> >>>> more in
> >>>> >> here:
> >>>> >>
> >>>> >>
> >>>>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >>>> >>
> >>>> >> Thanks,
> >>>> >> Botong
> >>>> >>
> >>>> >> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <
> jhyde.apa...@gmail.com
> >>>> >
> >>>> >> wrote:
> >>>> >>
> >>>> >>> PS The “editable doc” that Rui refers to is also a good idea. I
> >>>> think we
> >>>> >>> should create it to continue discussion after the first meeting.
> >>>> >>>
> >>>> >>> Julian
> >>>> >>>
> >>>> >>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <
> jhyde.apa...@gmail.com>
> >>>> >>> wrote:
> >>>> >>>>
> >>>> >>>> I think good next steps would be a PR and a meeting. The PR will
> >>>> allow
> >>>> >>> us to read the code, but I think we should do the first round of
> >>>> questions
> >>>> >>> at the meeting.  The meeting could perhaps start with a
> >>>> presentation of the
> >>>> >>> paper (do you have some slides you are planning to present at
> VLDB,
> >>>> >>> Botong?) and then move on to questions about the concepts, which
> >>>> >>> alternatives were considered, and how the concepts map onto other
> >>>> current
> >>>> >>> and future concepts in calcite.
> >>>> >>>>
> >>>> >>>> I don’t think we should start “reviewing” the PR line-by-line at
> >>>> this
> >>>> >>> point. We need to understand the high-level concepts and design
> >>>> choices. If
> >>>> >>> we start reviewing the PR we will get lost in the details.
> >>>> >>>>
> >>>> >>>> I know that integrating a major change is hard; I doubt that we
> >>>> will be
> >>>> >>> able to integrate everything, but we can build understanding about
> >>>> where
> >>>> >>> calcite needs to go, and I hope integrate a good amount of code to
> >>>> help us
> >>>> >>> get there.
> >>>> >>>>
> >>>> >>>> As I said before, after the integration I would like people to be
> >>>> able
> >>>> >>> to experiment with it and use it in their production systems.
> That
> >>>> way, it
> >>>> >>> will not be an experiment that withers, but a feature set
> >>>> integrates with
> >>>> >>> other calcite features and gets stronger over time.
> >>>> >>>>
> >>>> >>>> Julian
> >>>> >>>>
> >>>> >>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <amaliu...@apache.org>
> >>>> wrote:
> >>>> >>>>>
> >>>> >>>>> For me to participate in the discussion for the above
> questions,
> >>>> I
> >>>> >>> will
> >>>> >>>>> need to read a lot more to know relevant context and likely ask
> >>>> lots of
> >>>> >>>>> questions :-).  A editable doc is probably good for questions
> and
> >>>> back
> >>>> >>> and
> >>>> >>>>> forward discussion.
> >>>> >>>>>
> >>>> >>>>>
> >>>> >>>>> -Rui
> >>>> >>>>>
> >>>> >>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <
> amaliu...@apache.org
> >>>> >
> >>>> >>> wrote:
> >>>> >>>>>>
> >>>> >>>>>> I am also happy to help push this work into Calcite (review
> code
> >>>> and
> >>>> >>> doc,
> >>>> >>>>>> etc.).
> >>>> >>>>>>
> >>>> >>>>>> While you can share your code so people can have more idea how
> >>>> it is
> >>>> >>>>>> implemented, I think it would be also nice to have a doc to
> >>>> discuss
> >>>> >>> open
> >>>> >>>>>> questions above. Some points that I copy those to here:
> >>>> >>>>>>
> >>>> >>>>>> 1. Can this solution be compatible with existing solutions in
> >>>> Calcite
> >>>> >>>>>> Streaming, materialized view maintenance, and multi-query
> >>>> optimization
> >>>> >>>>>> (Sigma and Delta relational operators, lattice, and Spool
> >>>> operator),
> >>>> >>>>>> 2. Did you find that you needed two separate cost models - one
> >>>> for
> >>>> >>> “view
> >>>> >>>>>> maintenance” and another for “user queries” - since the
> >>>> objectives of
> >>>> >>> each
> >>>> >>>>>> activity are so different?
> >>>> >>>>>> 3. whether this work will hasten the arrival of multi-objective
> >>>> >>> parametric
> >>>> >>>>>> query optimization [1] in Calcite.
> >>>> >>>>>> 4. probably SQL shell support.
> >>>> >>>>>>
> >>>> >>>>>>
> >>>> >>>>>> [1]:
> >>>> >>>>>>
> >>>> >>>
> >>>>
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> >>>> >>>>>>
> >>>> >>>>>>
> >>>> >>>>>> -Rui
> >>>> >>>>>>
> >>>> >>>>>>
> >>>> >>>>>>
> >>>> >>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <zinki...@gmail.com>
> >>>> wrote:
> >>>> >>>>>>>
> >>>> >>>>>>> it would be very nice to see a POC of your work.
> >>>> >>>>>>>
> >>>> >>>>>>>
> >>>> >>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
> >>>> pku...@gmail.com>
> >>>> >>> wrote:
> >>>> >>>>>>>
> >>>> >>>>>>>> Hi Julian,
> >>>> >>>>>>>>
> >>>> >>>>>>>> Just wondering if there are any updates? We are wondering if
> it
> >>>> >>> would
> >>>> >>>>>>> help
> >>>> >>>>>>>> to post our code for a quick preview.
> >>>> >>>>>>>>
> >>>> >>>>>>>> Thanks,
> >>>> >>>>>>>> Botong
> >>>> >>>>>>>>
> >>>> >>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <
> pku...@gmail.com
> >>>> >
> >>>> >>> wrote:
> >>>> >>>>>>>>
> >>>> >>>>>>>>> Hi Julian,
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Thanks for your interest! Sure let's figure out a plan that
> >>>> best
> >>>> >>>>>>> benefits
> >>>> >>>>>>>>> the community. Here are some clarifications that hopefully
> >>>> answer
> >>>> >>> your
> >>>> >>>>>>>>> questions.
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> In our work (Tempura), users specify the set of time points
> to
> >>>> >>>>>>> consider
> >>>> >>>>>>>>> running and a cost function that expresses users' preference
> >>>> over
> >>>> >>>>>>> time,
> >>>> >>>>>>>>> Tempura will generate the best incremental plan that
> >>>> minimizes the
> >>>> >>>>>>>> overall
> >>>> >>>>>>>>> cost function.
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> In this incremental plan, the sub-plans at different time
> >>>> points
> >>>> >>> can
> >>>> >>>>>>> be
> >>>> >>>>>>>>> different from each other, as opposed to identical plans in
> >>>> all
> >>>> >>> delta
> >>>> >>>>>>>> runs
> >>>> >>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of the Tempura
> >>>> paper,
> >>>> >>> we
> >>>> >>>>>>> can
> >>>> >>>>>>>>> mimic the current streaming implementation by specifying two
> >>>> >>> (logical)
> >>>> >>>>>>>> time
> >>>> >>>>>>>>> points in Tempura, representing the initial run and later
> >>>> delta
> >>>> >>> runs
> >>>> >>>>>>>>> respectively. In general, note that Tempura supports various
> >>>> form
> >>>> >>> of
> >>>> >>>>>>>>> incremental computing, not only the small-delta append-only
> >>>> data
> >>>> >>>>>>> model in
> >>>> >>>>>>>>> streaming systems. That's why we believe Tempura subsumes
> the
> >>>> >>> current
> >>>> >>>>>>>>> streaming support, as well as any IVM implementations.
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> About the cost model, we did not come up with a seperate
> cost
> >>>> >>> model,
> >>>> >>>>>>> but
> >>>> >>>>>>>>> rather extended the existing one. Similar to multi-objective
> >>>> >>>>>>>> optimization,
> >>>> >>>>>>>>> costs incurred at different time points are considered
> >>>> different
> >>>> >>>>>>>>> dimensions. Tempura lets users supply a function that
> >>>> converts this
> >>>> >>>>>>> cost
> >>>> >>>>>>>>> vector into a final cost. So under this function, any two
> >>>> >>> incremental
> >>>> >>>>>>>> plans
> >>>> >>>>>>>>> are still comparable and there is an overall optimum. I
> guess
> >>>> we
> >>>> >>> can
> >>>> >>>>>>> go
> >>>> >>>>>>>>> down the route of multi-objective parametric query
> >>>> optimization
> >>>> >>>>>>> instead
> >>>> >>>>>>>> if
> >>>> >>>>>>>>> there is a need.
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Next on materialized views and multi-query optimization,
> >>>> since our
> >>>> >>>>>>>>> multi-time-point plan naturally involves materializing
> >>>> intermediate
> >>>> >>>>>>>> results
> >>>> >>>>>>>>> for later time points, we need to solve the problem of
> >>>> choosing
> >>>> >>>>>>>>> materializations and include the cost of saving and reusing
> >>>> the
> >>>> >>>>>>>>> materializations when costing and comparing plans. We
> >>>> borrowed the
> >>>> >>>>>>>>> multi-query optimization techniques to solve this problem
> even
> >>>> >>> though
> >>>> >>>>>>> we
> >>>> >>>>>>>>> are looking at a single query. As a result, we think our
> work
> >>>> is
> >>>> >>>>>>>> orthogonal
> >>>> >>>>>>>>> to Calcite's facilities around utilizing existing views,
> >>>> lattice
> >>>> >>> etc.
> >>>> >>>>>>> We
> >>>> >>>>>>>> do
> >>>> >>>>>>>>> feel that the multi-query optimization component can be
> >>>> adopted to
> >>>> >>>>>>> wider
> >>>> >>>>>>>>> use, but probably need more suggestions from the community.
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Lastly, our current implementation is set up in java code,
> it
> >>>> >>> should
> >>>> >>>>>>> be
> >>>> >>>>>>>>> straightforward to hook it up with SQL shell.
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Thanks,
> >>>> >>>>>>>>> Botong
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
> >>>> >>> jhyde.apa...@gmail.com>
> >>>> >>>>>>>>> wrote:
> >>>> >>>>>>>>>
> >>>> >>>>>>>>>> Botong,
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> This is very exciting; congratulations on this research,
> and
> >>>> thank
> >>>> >>>>>>> you
> >>>> >>>>>>>>>> for contributing it back to Calcite.
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> The research touches several areas in Calcite: streaming,
> >>>> >>>>>>> materialized
> >>>> >>>>>>>>>> view maintenance, and multi-query optimization. As we have
> >>>> already
> >>>> >>>>>>> some
> >>>> >>>>>>>>>> solutions in those areas (Sigma and Delta relational
> >>>> operators,
> >>>> >>>>>>> lattice,
> >>>> >>>>>>>>>> and Spool operator), it will be interesting to see whether
> >>>> we can
> >>>> >>>>>>> make
> >>>> >>>>>>>> them
> >>>> >>>>>>>>>> compatible, or whether one concept can subsume others.
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> Your work differs from streaming queries in that your
> >>>> relations
> >>>> >>> are
> >>>> >>>>>>> used
> >>>> >>>>>>>>>> by “external” user queries, whereas in pure streaming
> >>>> queries, the
> >>>> >>>>>>> only
> >>>> >>>>>>>>>> activity is the change propagation. Did you find that you
> >>>> needed
> >>>> >>> two
> >>>> >>>>>>>>>> separate cost models - one for “view maintenance” and
> >>>> another for
> >>>> >>>>>>> “user
> >>>> >>>>>>>>>> queries” - since the objectives of each activity are so
> >>>> different?
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> I wonder whether this work will hasten the arrival of
> >>>> >>> multi-objective
> >>>> >>>>>>>>>> parametric query optimization [1] in Calcite.
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> I will make time over the next few days to read and digest
> >>>> your
> >>>> >>>>>>> paper.
> >>>> >>>>>>>>>> Then I expect that we will have a back-and-forth process to
> >>>> create
> >>>> >>>>>>>>>> something that will be useful for the broader community.
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> One thing will be particularly useful: making this
> >>>> functionality
> >>>> >>>>>>>>>> available from a SQL shell, so that people can experiment
> >>>> with
> >>>> >>> this
> >>>> >>>>>>>>>> functionality without writing Java code or setting up
> complex
> >>>> >>>>>>> databases
> >>>> >>>>>>>> and
> >>>> >>>>>>>>>> metadata. I have in mind something like the simple DDL
> >>>> operations
> >>>> >>>>>>> that
> >>>> >>>>>>>> are
> >>>> >>>>>>>>>> available in Calcite’s ’server’ module. I wonder whether we
> >>>> could
> >>>> >>>>>>> devise
> >>>> >>>>>>>>>> some kind of SQL syntax for a “multi-query”.
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> Julian
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> [1]
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>
> >>>> >>>>>>>
> >>>> >>>
> >>>>
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <
> pku...@gmail.com
> >>>> >
> >>>> >>>>>>> wrote:
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> Thanks Aron for pointing this out. To see the figure,
> please
> >>>> >>> refer
> >>>> >>>>>>> to
> >>>> >>>>>>>>>> Fig
> >>>> >>>>>>>>>>> 3(a) in our paper:
> >>>> >>>>>>>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> Best,
> >>>> >>>>>>>>>>> Botong
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
> >>>> taojia...@gmail.com>
> >>>> >>>>>>>> wrote:
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>>> Seems interesting, the pic can not be seen in the mail,
> >>>> may you
> >>>> >>>>>>> open
> >>>> >>>>>>>> a
> >>>> >>>>>>>>>> JIRA
> >>>> >>>>>>>>>>>> for this, people who are interested in this can subscribe
> >>>> to the
> >>>> >>>>>>>> JIRA?
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>> Regards!
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>> Aron Tao
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>> Botong Huang <bot...@apache.org> 于2020年12月24日周四
> 上午3:18写道:
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> Hi all,
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> This is a proposal to extend the Calcite optimizer into
> a
> >>>> >>> general
> >>>> >>>>>>>>>>>>> incremental query optimizer, based on our research paper
> >>>> >>>>>>> published
> >>>> >>>>>>>> in
> >>>> >>>>>>>>>>>> VLDB
> >>>> >>>>>>>>>>>>> 2021:
> >>>> >>>>>>>>>>>>> Tempura: a general cost-based optimizer framework for
> >>>> >>> incremental
> >>>> >>>>>>>> data
> >>>> >>>>>>>>>>>>> processing
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating how
> >>>> Alibaba’s
> >>>> >>>>>>> data
> >>>> >>>>>>>>>>>>> warehouse is planning to use this incremental query
> >>>> optimizer
> >>>> >>> to
> >>>> >>>>>>>>>>>> alleviate
> >>>> >>>>>>>>>>>>> cluster-wise resource skewness:
> >>>> >>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting Resource-Aware
> >>>> >>> Incremental
> >>>> >>>>>>>>>>>> Computing
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> To our best knowledge, this is the first general
> >>>> cost-based
> >>>> >>>>>>>>>> incremental
> >>>> >>>>>>>>>>>>> optimizer that can find the best plan across multiple
> >>>> families
> >>>> >>> of
> >>>> >>>>>>>>>>>>> incremental computing methods, including IVM, Streaming,
> >>>> >>>>>>> DBToaster,
> >>>> >>>>>>>>>> etc.
> >>>> >>>>>>>>>>>>> Experiments (in the paper) shows that the generated best
> >>>> plan
> >>>> >>> is
> >>>> >>>>>>>>>>>>> consistently much better than the plans from each
> >>>> individual
> >>>> >>>>>>> method
> >>>> >>>>>>>>>>>> alone.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> In general, incremental query planning is central to
> >>>> database
> >>>> >>>>>>> view
> >>>> >>>>>>>>>>>>> maintenance and stream processing systems, and are being
> >>>> >>> adopted
> >>>> >>>>>>> in
> >>>> >>>>>>>>>>>> active
> >>>> >>>>>>>>>>>>> databases, resumable query execution, approximate query
> >>>> >>>>>>> processing,
> >>>> >>>>>>>>>> etc.
> >>>> >>>>>>>>>>>> We
> >>>> >>>>>>>>>>>>> are hoping that this feature can help widening the
> >>>> spectrum of
> >>>> >>>>>>>>>> Calcite,
> >>>> >>>>>>>>>>>>> solicit more use cases and adoption of Calcite.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> Below is a brief description of the technical details.
> >>>> Please
> >>>> >>>>>>> refer
> >>>> >>>>>>>> to
> >>>> >>>>>>>>>>>> the
> >>>> >>>>>>>>>>>>> Tempura paper for more details. We are also working on a
> >>>> >>> journal
> >>>> >>>>>>>>>> version
> >>>> >>>>>>>>>>>> of
> >>>> >>>>>>>>>>>>> the paper with more implementation details.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> Currently the query plan generated by Calcite is meant
> to
> >>>> be
> >>>> >>>>>>>> executed
> >>>> >>>>>>>>>>>>> altogether at once. In the proposal, Calcite’s memo will
> >>>> be
> >>>> >>>>>>> extended
> >>>> >>>>>>>>>> with
> >>>> >>>>>>>>>>>>> temporal information so that it is capable of generating
> >>>> >>>>>>> incremental
> >>>> >>>>>>>>>>>> plans
> >>>> >>>>>>>>>>>>> that include multiple sub-plans to execute at different
> >>>> time
> >>>> >>>>>>> points.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> The main idea is to view each table as one that changes
> >>>> over
> >>>> >>> time
> >>>> >>>>>>>>>> (Time
> >>>> >>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we introduced
> >>>> >>>>>>> TvrMetaSet
> >>>> >>>>>>>>>> into
> >>>> >>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to track
> >>>> related
> >>>> >>>>>>> RelSets
> >>>> >>>>>>>>>> of a
> >>>> >>>>>>>>>>>>> changing table (e.g. snapshot of the table at certain
> >>>> time,
> >>>> >>>>>>> delta of
> >>>> >>>>>>>>>> the
> >>>> >>>>>>>>>>>>> table between two time points, etc.).
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> [image: image.png]
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> For example in the above figure, each vertical line is a
> >>>> >>>>>>> TvrMetaSet
> >>>> >>>>>>>>>>>>> representing a TVR (S, R, S left outer join R, etc.).
> >>>> >>> Horizontal
> >>>> >>>>>>>> lines
> >>>> >>>>>>>>>>>>> represent time. Each black dot in the grid is a RelSet.
> >>>> Users
> >>>> >>> can
> >>>> >>>>>>>>>> write
> >>>> >>>>>>>>>>>> TVR
> >>>> >>>>>>>>>>>>> Rewrite Rules to describe valid transformations between
> >>>> these
> >>>> >>>>>>> dots.
> >>>> >>>>>>>>>> For
> >>>> >>>>>>>>>>>>> example, the blues lines are inter-TVR rules that
> >>>> describe how
> >>>> >>> to
> >>>> >>>>>>>>>> compute
> >>>> >>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other TVRs. The
> >>>> red
> >>>> >>> lines
> >>>> >>>>>>>> are
> >>>> >>>>>>>>>>>>> intra-TVR rules that describe transformations within a
> >>>> TVR. All
> >>>> >>>>>>> TVR
> >>>> >>>>>>>>>>>> rewrite
> >>>> >>>>>>>>>>>>> rules are logical rules. All existing Calcite rules
> still
> >>>> work
> >>>> >>> in
> >>>> >>>>>>>> the
> >>>> >>>>>>>>>> new
> >>>> >>>>>>>>>>>>> volcano system without modification.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> All changes in this feature will consist of four parts:
> >>>> >>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
> >>>> >>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching TvrMetaSet
> and
> >>>> >>>>>>> RelNodes,
> >>>> >>>>>>>>>> as
> >>>> >>>>>>>>>>>>> well as links in between the nodes.
> >>>> >>>>>>>>>>>>> 3. A basic set of TvrRules, written using the upgraded
> >>>> rule
> >>>> >>>>>>> engine
> >>>> >>>>>>>>>> API.
> >>>> >>>>>>>>>>>>> 4. Multi-query optimization, used to find the best
> >>>> incremental
> >>>> >>>>>>> plan
> >>>> >>>>>>>>>>>>> involving multiple time points.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> Note that this feature is an extension in nature and
> thus
> >>>> when
> >>>> >>>>>>>>>> disabled,
> >>>> >>>>>>>>>>>>> does not change any existing Calcite behavior.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> Other than scenarios in the paper, we also applied this
> >>>> >>>>>>>>>> Calcite-extended
> >>>> >>>>>>>>>>>>> incremental query optimizer to a type of periodic query
> >>>> called
> >>>> >>>>>>> the
> >>>> >>>>>>>>>>>> ‘‘range
> >>>> >>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It achieved cost
> >>>> savings
> >>>> >>> of
> >>>> >>>>>>> 80%
> >>>> >>>>>>>>>> on
> >>>> >>>>>>>>>>>>> total CPU and memory consumption, and 60% on end-to-end
> >>>> >>> execution
> >>>> >>>>>>>>>> time.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> All comments and suggestions are welcome. Thanks and
> happy
> >>>> >>>>>>> holidays!
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> Best,
> >>>> >>>>>>>>>>>>> Botong
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>
> >>>> >>>>>>>
> >>>> >>>>>>>
> >>>> >>>>>>> --
> >>>> >>>>>>> ~~~~~~~~~~~~~~~
> >>>> >>>>>>> no mistakes
> >>>> >>>>>>> ~~~~~~~~~~~~~~~~~~
> >>>> >>>>>>>
> >>>> >>>>>>
> >>>> >>>
> >>>> >>
> >>>>
> >>>>
>


-- 
Viliam Durina
Jet Developer
      hazelcast®

  <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA 94402 |
USA
+1 (650) 521-5453 | hazelcast.com <https://www.hazelcast.com>

-- 
This message contains confidential information and is intended only for the 
individuals named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. E-mail transmission cannot be 
guaranteed to be secure or error-free as information could be intercepted, 
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. 
The sender therefore does not accept liability for any errors or omissions 
in the contents of this message, which arise as a result of e-mail 
transmission. If verification is required, please request a hard-copy 
version. -Hazelcast

Reply via email to