Re: Proposal to extend Calcite into a incremental query optimizer

Botong Huang Wed, 21 Apr 2021 17:20:36 -0700

Hi all,
We've created a zoom meeting below for our meeting next Monday (9pm-10:30pm
PST on 04/26).
Talk to you all soon!


Join Zoom Meeting
https://uci.zoom.us/j/91279732686
<https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE>

Meeting ID: 912 7973 2686
One tap mobile
+16699006833,,91279732686# US (San Jose)
+12532158782,,91279732686# US (Tacoma)

Dial by your location
+1 669 900 6833 US (San Jose)
+1 253 215 8782 US (Tacoma)
+1 346 248 7799 US (Houston)
+1 301 715 8592 US (Washington DC)
+1 312 626 6799 US (Chicago)
+1 646 558 8656 US (New York)
Meeting ID: 912 7973 2686
Find your local number: https://uci.zoom.us/u/aykHTkJBh
<https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM>

Join by Skype for Business
https://uci.zoom.us/skype/91279732686
<https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy>


Thanks,
Botong

On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <[email protected]> wrote:

> Hi all,
>
> According to the preferences collected, we are tentatively scheduling our
> meeting at 9pm-10:30pm PST on 04/26 Monday.
>
> We will give a presentation about Tempura, followed by a free discussion.
>
> Please let us know if there are new other requests. Few days before
> the meeting, I will send out a zoom meeting link.
>
> Thanks,
> Botong
>
> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <[email protected]> wrote:
>
>> Hi Julian and all,
>>
>> We've posted the Tempura code base below. Feel free to take a quick peek
>> at the last five commits.
>> https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
>>
>> I've also opened a Jira (CALCITE-4568
>> <https://issues.apache.org/jira/browse/CALCITE-4568>), which will serve
>> as the umbrella Jira for the feature.
>>
>> In the meantime, we encourage everyone to enter the time preferences for
>> our first meeting here:
>>
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>>
>> Thanks,
>> Botong
>>
>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <[email protected]>
>> wrote:
>>
>>> I have added my time preferences to the doc.
>>>
>>> Before we meet, could you publish a PR for us to review?
>>>
>>> Initial discussions will need to be about architecture and high-level
>>> design. So I would ask Calcite reviewers not to review the PR line-by-line
>>> (or to leave comments in GitHub) but try to understand the design
>>> holistically, and prepare questions/comments before the meeting.
>>>
>>> Botong, Can you please create a Calcite JIRA case for this task? JIRA
>>> how we track long-running tasks such as this.
>>>
>>> Julian
>>>
>>>
>>> > On Apr 3, 2021, at 5:15 PM, Botong Huang <[email protected]> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > Apology for the delay. It took us some time to clean up our code base
>>> and
>>> > publicly release it (which will be out soon) for a quick peek.
>>> >
>>> > We are ready to present our work. Let's schedule a time for a Zoom
>>> > meeting and discuss how to integrate Tempura into Calcite.
>>> >
>>> > Since some of our team members are in China, we prefer the time slot of
>>> > 7:00pm-11:30pm PST any day. I've added our time preference in the
>>> shared
>>> > doc below.
>>> >
>>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>>> >
>>> > We encourage everyone to add their time preferences (during
>>> 04/15-04/30) in
>>> > this doc. In a week or so, we will try to settle a time that works for
>>> > most.
>>> >
>>> > Thanks,
>>> > Botong
>>> >
>>> > On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <[email protected]> wrote:
>>> >
>>> >> Hi Julian and Rui,
>>> >>
>>> >> Sounds good to us. Please give us some time to prepare some slides
>>> for the
>>> >> meeting.
>>> >>
>>> >> I've created a doc below for discussion. Please feel free to add more
>>> in
>>> >> here:
>>> >>
>>> >>
>>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>>> >>
>>> >> Thanks,
>>> >> Botong
>>> >>
>>> >> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <[email protected]>
>>> >> wrote:
>>> >>
>>> >>> PS The “editable doc” that Rui refers to is also a good idea. I
>>> think we
>>> >>> should create it to continue discussion after the first meeting.
>>> >>>
>>> >>> Julian
>>> >>>
>>> >>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <[email protected]>
>>> >>> wrote:
>>> >>>>
>>> >>>> I think good next steps would be a PR and a meeting. The PR will
>>> allow
>>> >>> us to read the code, but I think we should do the first round of
>>> questions
>>> >>> at the meeting.  The meeting could perhaps start with a presentation
>>> of the
>>> >>> paper (do you have some slides you are planning to present at VLDB,
>>> >>> Botong?) and then move on to questions about the concepts, which
>>> >>> alternatives were considered, and how the concepts map onto other
>>> current
>>> >>> and future concepts in calcite.
>>> >>>>
>>> >>>> I don’t think we should start “reviewing” the PR line-by-line at
>>> this
>>> >>> point. We need to understand the high-level concepts and design
>>> choices. If
>>> >>> we start reviewing the PR we will get lost in the details.
>>> >>>>
>>> >>>> I know that integrating a major change is hard; I doubt that we
>>> will be
>>> >>> able to integrate everything, but we can build understanding about
>>> where
>>> >>> calcite needs to go, and I hope integrate a good amount of code to
>>> help us
>>> >>> get there.
>>> >>>>
>>> >>>> As I said before, after the integration I would like people to be
>>> able
>>> >>> to experiment with it and use it in their production systems.  That
>>> way, it
>>> >>> will not be an experiment that withers, but a feature set integrates
>>> with
>>> >>> other calcite features and gets stronger over time.
>>> >>>>
>>> >>>> Julian
>>> >>>>
>>> >>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <[email protected]>
>>> wrote:
>>> >>>>>
>>> >>>>> For me to participate in the discussion for the above questions, I
>>> >>> will
>>> >>>>> need to read a lot more to know relevant context and likely ask
>>> lots of
>>> >>>>> questions :-).  A editable doc is probably good for questions and
>>> back
>>> >>> and
>>> >>>>> forward discussion.
>>> >>>>>
>>> >>>>>
>>> >>>>> -Rui
>>> >>>>>
>>> >>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <[email protected]>
>>> >>> wrote:
>>> >>>>>>
>>> >>>>>> I am also happy to help push this work into Calcite (review code
>>> and
>>> >>> doc,
>>> >>>>>> etc.).
>>> >>>>>>
>>> >>>>>> While you can share your code so people can have more idea how it
>>> is
>>> >>>>>> implemented, I think it would be also nice to have a doc to
>>> discuss
>>> >>> open
>>> >>>>>> questions above. Some points that I copy those to here:
>>> >>>>>>
>>> >>>>>> 1. Can this solution be compatible with existing solutions in
>>> Calcite
>>> >>>>>> Streaming, materialized view maintenance, and multi-query
>>> optimization
>>> >>>>>> (Sigma and Delta relational operators, lattice, and Spool
>>> operator),
>>> >>>>>> 2. Did you find that you needed two separate cost models - one for
>>> >>> “view
>>> >>>>>> maintenance” and another for “user queries” - since the
>>> objectives of
>>> >>> each
>>> >>>>>> activity are so different?
>>> >>>>>> 3. whether this work will hasten the arrival of multi-objective
>>> >>> parametric
>>> >>>>>> query optimization [1] in Calcite.
>>> >>>>>> 4. probably SQL shell support.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> [1]:
>>> >>>>>>
>>> >>>
>>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> -Rui
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <[email protected]>
>>> wrote:
>>> >>>>>>>
>>> >>>>>>> it would be very nice to see a POC of your work.
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <[email protected]
>>> >
>>> >>> wrote:
>>> >>>>>>>
>>> >>>>>>>> Hi Julian,
>>> >>>>>>>>
>>> >>>>>>>> Just wondering if there are any updates? We are wondering if it
>>> >>> would
>>> >>>>>>> help
>>> >>>>>>>> to post our code for a quick preview.
>>> >>>>>>>>
>>> >>>>>>>> Thanks,
>>> >>>>>>>> Botong
>>> >>>>>>>>
>>> >>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <[email protected]>
>>> >>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>> Hi Julian,
>>> >>>>>>>>>
>>> >>>>>>>>> Thanks for your interest! Sure let's figure out a plan that
>>> best
>>> >>>>>>> benefits
>>> >>>>>>>>> the community. Here are some clarifications that hopefully
>>> answer
>>> >>> your
>>> >>>>>>>>> questions.
>>> >>>>>>>>>
>>> >>>>>>>>> In our work (Tempura), users specify the set of time points to
>>> >>>>>>> consider
>>> >>>>>>>>> running and a cost function that expresses users' preference
>>> over
>>> >>>>>>> time,
>>> >>>>>>>>> Tempura will generate the best incremental plan that minimizes
>>> the
>>> >>>>>>>> overall
>>> >>>>>>>>> cost function.
>>> >>>>>>>>>
>>> >>>>>>>>> In this incremental plan, the sub-plans at different time
>>> points
>>> >>> can
>>> >>>>>>> be
>>> >>>>>>>>> different from each other, as opposed to identical plans in all
>>> >>> delta
>>> >>>>>>>> runs
>>> >>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of the Tempura
>>> paper,
>>> >>> we
>>> >>>>>>> can
>>> >>>>>>>>> mimic the current streaming implementation by specifying two
>>> >>> (logical)
>>> >>>>>>>> time
>>> >>>>>>>>> points in Tempura, representing the initial run and later delta
>>> >>> runs
>>> >>>>>>>>> respectively. In general, note that Tempura supports various
>>> form
>>> >>> of
>>> >>>>>>>>> incremental computing, not only the small-delta append-only
>>> data
>>> >>>>>>> model in
>>> >>>>>>>>> streaming systems. That's why we believe Tempura subsumes the
>>> >>> current
>>> >>>>>>>>> streaming support, as well as any IVM implementations.
>>> >>>>>>>>>
>>> >>>>>>>>> About the cost model, we did not come up with a seperate cost
>>> >>> model,
>>> >>>>>>> but
>>> >>>>>>>>> rather extended the existing one. Similar to multi-objective
>>> >>>>>>>> optimization,
>>> >>>>>>>>> costs incurred at different time points are considered
>>> different
>>> >>>>>>>>> dimensions. Tempura lets users supply a function that converts
>>> this
>>> >>>>>>> cost
>>> >>>>>>>>> vector into a final cost. So under this function, any two
>>> >>> incremental
>>> >>>>>>>> plans
>>> >>>>>>>>> are still comparable and there is an overall optimum. I guess
>>> we
>>> >>> can
>>> >>>>>>> go
>>> >>>>>>>>> down the route of multi-objective parametric query optimization
>>> >>>>>>> instead
>>> >>>>>>>> if
>>> >>>>>>>>> there is a need.
>>> >>>>>>>>>
>>> >>>>>>>>> Next on materialized views and multi-query optimization, since
>>> our
>>> >>>>>>>>> multi-time-point plan naturally involves materializing
>>> intermediate
>>> >>>>>>>> results
>>> >>>>>>>>> for later time points, we need to solve the problem of choosing
>>> >>>>>>>>> materializations and include the cost of saving and reusing the
>>> >>>>>>>>> materializations when costing and comparing plans. We borrowed
>>> the
>>> >>>>>>>>> multi-query optimization techniques to solve this problem even
>>> >>> though
>>> >>>>>>> we
>>> >>>>>>>>> are looking at a single query. As a result, we think our work
>>> is
>>> >>>>>>>> orthogonal
>>> >>>>>>>>> to Calcite's facilities around utilizing existing views,
>>> lattice
>>> >>> etc.
>>> >>>>>>> We
>>> >>>>>>>> do
>>> >>>>>>>>> feel that the multi-query optimization component can be
>>> adopted to
>>> >>>>>>> wider
>>> >>>>>>>>> use, but probably need more suggestions from the community.
>>> >>>>>>>>>
>>> >>>>>>>>> Lastly, our current implementation is set up in java code, it
>>> >>> should
>>> >>>>>>> be
>>> >>>>>>>>> straightforward to hook it up with SQL shell.
>>> >>>>>>>>>
>>> >>>>>>>>> Thanks,
>>> >>>>>>>>> Botong
>>> >>>>>>>>>
>>> >>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
>>> >>> [email protected]>
>>> >>>>>>>>> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>>> Botong,
>>> >>>>>>>>>>
>>> >>>>>>>>>> This is very exciting; congratulations on this research, and
>>> thank
>>> >>>>>>> you
>>> >>>>>>>>>> for contributing it back to Calcite.
>>> >>>>>>>>>>
>>> >>>>>>>>>> The research touches several areas in Calcite: streaming,
>>> >>>>>>> materialized
>>> >>>>>>>>>> view maintenance, and multi-query optimization. As we have
>>> already
>>> >>>>>>> some
>>> >>>>>>>>>> solutions in those areas (Sigma and Delta relational
>>> operators,
>>> >>>>>>> lattice,
>>> >>>>>>>>>> and Spool operator), it will be interesting to see whether we
>>> can
>>> >>>>>>> make
>>> >>>>>>>> them
>>> >>>>>>>>>> compatible, or whether one concept can subsume others.
>>> >>>>>>>>>>
>>> >>>>>>>>>> Your work differs from streaming queries in that your
>>> relations
>>> >>> are
>>> >>>>>>> used
>>> >>>>>>>>>> by “external” user queries, whereas in pure streaming
>>> queries, the
>>> >>>>>>> only
>>> >>>>>>>>>> activity is the change propagation. Did you find that you
>>> needed
>>> >>> two
>>> >>>>>>>>>> separate cost models - one for “view maintenance” and another
>>> for
>>> >>>>>>> “user
>>> >>>>>>>>>> queries” - since the objectives of each activity are so
>>> different?
>>> >>>>>>>>>>
>>> >>>>>>>>>> I wonder whether this work will hasten the arrival of
>>> >>> multi-objective
>>> >>>>>>>>>> parametric query optimization [1] in Calcite.
>>> >>>>>>>>>>
>>> >>>>>>>>>> I will make time over the next few days to read and digest
>>> your
>>> >>>>>>> paper.
>>> >>>>>>>>>> Then I expect that we will have a back-and-forth process to
>>> create
>>> >>>>>>>>>> something that will be useful for the broader community.
>>> >>>>>>>>>>
>>> >>>>>>>>>> One thing will be particularly useful: making this
>>> functionality
>>> >>>>>>>>>> available from a SQL shell, so that people can experiment with
>>> >>> this
>>> >>>>>>>>>> functionality without writing Java code or setting up complex
>>> >>>>>>> databases
>>> >>>>>>>> and
>>> >>>>>>>>>> metadata. I have in mind something like the simple DDL
>>> operations
>>> >>>>>>> that
>>> >>>>>>>> are
>>> >>>>>>>>>> available in Calcite’s ’server’ module. I wonder whether we
>>> could
>>> >>>>>>> devise
>>> >>>>>>>>>> some kind of SQL syntax for a “multi-query”.
>>> >>>>>>>>>>
>>> >>>>>>>>>> Julian
>>> >>>>>>>>>>
>>> >>>>>>>>>> [1]
>>> >>>>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>
>>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <[email protected]>
>>> >>>>>>> wrote:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Thanks Aron for pointing this out. To see the figure, please
>>> >>> refer
>>> >>>>>>> to
>>> >>>>>>>>>> Fig
>>> >>>>>>>>>>> 3(a) in our paper:
>>> >>>>>>>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Best,
>>> >>>>>>>>>>> Botong
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
>>> [email protected]>
>>> >>>>>>>> wrote:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> Seems interesting, the pic can not be seen in the mail, may
>>> you
>>> >>>>>>> open
>>> >>>>>>>> a
>>> >>>>>>>>>> JIRA
>>> >>>>>>>>>>>> for this, people who are interested in this can subscribe
>>> to the
>>> >>>>>>>> JIRA?
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Regards!
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Aron Tao
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Botong Huang <[email protected]> 于2020年12月24日周四 上午3:18写道：
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>> Hi all,
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> This is a proposal to extend the Calcite optimizer into a
>>> >>> general
>>> >>>>>>>>>>>>> incremental query optimizer, based on our research paper
>>> >>>>>>> published
>>> >>>>>>>> in
>>> >>>>>>>>>>>> VLDB
>>> >>>>>>>>>>>>> 2021:
>>> >>>>>>>>>>>>> Tempura: a general cost-based optimizer framework for
>>> >>> incremental
>>> >>>>>>>> data
>>> >>>>>>>>>>>>> processing
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating how
>>> Alibaba’s
>>> >>>>>>> data
>>> >>>>>>>>>>>>> warehouse is planning to use this incremental query
>>> optimizer
>>> >>> to
>>> >>>>>>>>>>>> alleviate
>>> >>>>>>>>>>>>> cluster-wise resource skewness:
>>> >>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting Resource-Aware
>>> >>> Incremental
>>> >>>>>>>>>>>> Computing
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> To our best knowledge, this is the first general cost-based
>>> >>>>>>>>>> incremental
>>> >>>>>>>>>>>>> optimizer that can find the best plan across multiple
>>> families
>>> >>> of
>>> >>>>>>>>>>>>> incremental computing methods, including IVM, Streaming,
>>> >>>>>>> DBToaster,
>>> >>>>>>>>>> etc.
>>> >>>>>>>>>>>>> Experiments (in the paper) shows that the generated best
>>> plan
>>> >>> is
>>> >>>>>>>>>>>>> consistently much better than the plans from each
>>> individual
>>> >>>>>>> method
>>> >>>>>>>>>>>> alone.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> In general, incremental query planning is central to
>>> database
>>> >>>>>>> view
>>> >>>>>>>>>>>>> maintenance and stream processing systems, and are being
>>> >>> adopted
>>> >>>>>>> in
>>> >>>>>>>>>>>> active
>>> >>>>>>>>>>>>> databases, resumable query execution, approximate query
>>> >>>>>>> processing,
>>> >>>>>>>>>> etc.
>>> >>>>>>>>>>>> We
>>> >>>>>>>>>>>>> are hoping that this feature can help widening the
>>> spectrum of
>>> >>>>>>>>>> Calcite,
>>> >>>>>>>>>>>>> solicit more use cases and adoption of Calcite.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Below is a brief description of the technical details.
>>> Please
>>> >>>>>>> refer
>>> >>>>>>>> to
>>> >>>>>>>>>>>> the
>>> >>>>>>>>>>>>> Tempura paper for more details. We are also working on a
>>> >>> journal
>>> >>>>>>>>>> version
>>> >>>>>>>>>>>> of
>>> >>>>>>>>>>>>> the paper with more implementation details.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Currently the query plan generated by Calcite is meant to
>>> be
>>> >>>>>>>> executed
>>> >>>>>>>>>>>>> altogether at once. In the proposal, Calcite’s memo will be
>>> >>>>>>> extended
>>> >>>>>>>>>> with
>>> >>>>>>>>>>>>> temporal information so that it is capable of generating
>>> >>>>>>> incremental
>>> >>>>>>>>>>>> plans
>>> >>>>>>>>>>>>> that include multiple sub-plans to execute at different
>>> time
>>> >>>>>>> points.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> The main idea is to view each table as one that changes
>>> over
>>> >>> time
>>> >>>>>>>>>> (Time
>>> >>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we introduced
>>> >>>>>>> TvrMetaSet
>>> >>>>>>>>>> into
>>> >>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to track
>>> related
>>> >>>>>>> RelSets
>>> >>>>>>>>>> of a
>>> >>>>>>>>>>>>> changing table (e.g. snapshot of the table at certain time,
>>> >>>>>>> delta of
>>> >>>>>>>>>> the
>>> >>>>>>>>>>>>> table between two time points, etc.).
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> [image: image.png]
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> For example in the above figure, each vertical line is a
>>> >>>>>>> TvrMetaSet
>>> >>>>>>>>>>>>> representing a TVR (S, R, S left outer join R, etc.).
>>> >>> Horizontal
>>> >>>>>>>> lines
>>> >>>>>>>>>>>>> represent time. Each black dot in the grid is a RelSet.
>>> Users
>>> >>> can
>>> >>>>>>>>>> write
>>> >>>>>>>>>>>> TVR
>>> >>>>>>>>>>>>> Rewrite Rules to describe valid transformations between
>>> these
>>> >>>>>>> dots.
>>> >>>>>>>>>> For
>>> >>>>>>>>>>>>> example, the blues lines are inter-TVR rules that describe
>>> how
>>> >>> to
>>> >>>>>>>>>> compute
>>> >>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other TVRs. The red
>>> >>> lines
>>> >>>>>>>> are
>>> >>>>>>>>>>>>> intra-TVR rules that describe transformations within a
>>> TVR. All
>>> >>>>>>> TVR
>>> >>>>>>>>>>>> rewrite
>>> >>>>>>>>>>>>> rules are logical rules. All existing Calcite rules still
>>> work
>>> >>> in
>>> >>>>>>>> the
>>> >>>>>>>>>> new
>>> >>>>>>>>>>>>> volcano system without modification.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> All changes in this feature will consist of four parts:
>>> >>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
>>> >>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching TvrMetaSet and
>>> >>>>>>> RelNodes,
>>> >>>>>>>>>> as
>>> >>>>>>>>>>>>> well as links in between the nodes.
>>> >>>>>>>>>>>>> 3. A basic set of TvrRules, written using the upgraded rule
>>> >>>>>>> engine
>>> >>>>>>>>>> API.
>>> >>>>>>>>>>>>> 4. Multi-query optimization, used to find the best
>>> incremental
>>> >>>>>>> plan
>>> >>>>>>>>>>>>> involving multiple time points.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Note that this feature is an extension in nature and thus
>>> when
>>> >>>>>>>>>> disabled,
>>> >>>>>>>>>>>>> does not change any existing Calcite behavior.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Other than scenarios in the paper, we also applied this
>>> >>>>>>>>>> Calcite-extended
>>> >>>>>>>>>>>>> incremental query optimizer to a type of periodic query
>>> called
>>> >>>>>>> the
>>> >>>>>>>>>>>> ‘‘range
>>> >>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It achieved cost
>>> savings
>>> >>> of
>>> >>>>>>> 80%
>>> >>>>>>>>>> on
>>> >>>>>>>>>>>>> total CPU and memory consumption, and 60% on end-to-end
>>> >>> execution
>>> >>>>>>>>>> time.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> All comments and suggestions are welcome. Thanks and happy
>>> >>>>>>> holidays!
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>> Botong
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> --
>>> >>>>>>> ~~~~~~~~~~~~~~~
>>> >>>>>>> no mistakes
>>> >>>>>>> ~~~~~~~~~~~~~~~~~~
>>> >>>>>>>
>>> >>>>>>
>>> >>>
>>> >>
>>>
>>>

Re: Proposal to extend Calcite into a incremental query optimizer

Reply via email to