Hi all,

The meeting yesterday was fun and productive. As discussed, this is the
call to schedule our second meeting.

We encourage everyone to add their time preferences during 05/01 - 05/15
here:
https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing

Thanks,
Botong

On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <[email protected]> wrote:

> Hi all,
> We've created a zoom meeting below for our meeting next Monday
> (9pm-10:30pm PST on 04/26).
> Talk to you all soon!
>
> Join Zoom Meeting
> https://uci.zoom.us/j/91279732686
> <https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE>
>
> Meeting ID: 912 7973 2686
> One tap mobile
> +16699006833,,91279732686# US (San Jose)
> +12532158782,,91279732686# US (Tacoma)
>
> Dial by your location
> +1 669 900 6833 US (San Jose)
> +1 253 215 8782 US (Tacoma)
> +1 346 248 7799 US (Houston)
> +1 301 715 8592 US (Washington DC)
> +1 312 626 6799 US (Chicago)
> +1 646 558 8656 US (New York)
> Meeting ID: 912 7973 2686
> Find your local number: https://uci.zoom.us/u/aykHTkJBh
> <https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM>
>
> Join by Skype for Business
> https://uci.zoom.us/skype/91279732686
> <https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy>
>
>
> Thanks,
> Botong
>
> On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <[email protected]> wrote:
>
>> Hi all,
>>
>> According to the preferences collected, we are tentatively scheduling our
>> meeting at 9pm-10:30pm PST on 04/26 Monday.
>>
>> We will give a presentation about Tempura, followed by a free discussion.
>>
>> Please let us know if there are new other requests. Few days before
>> the meeting, I will send out a zoom meeting link.
>>
>> Thanks,
>> Botong
>>
>> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <[email protected]> wrote:
>>
>>> Hi Julian and all,
>>>
>>> We've posted the Tempura code base below. Feel free to take a quick peek
>>> at the last five commits.
>>> https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
>>>
>>> I've also opened a Jira (CALCITE-4568
>>> <https://issues.apache.org/jira/browse/CALCITE-4568>), which will serve
>>> as the umbrella Jira for the feature.
>>>
>>> In the meantime, we encourage everyone to enter the time preferences for
>>> our first meeting here:
>>>
>>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>>>
>>> Thanks,
>>> Botong
>>>
>>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <[email protected]>
>>> wrote:
>>>
>>>> I have added my time preferences to the doc.
>>>>
>>>> Before we meet, could you publish a PR for us to review?
>>>>
>>>> Initial discussions will need to be about architecture and high-level
>>>> design. So I would ask Calcite reviewers not to review the PR line-by-line
>>>> (or to leave comments in GitHub) but try to understand the design
>>>> holistically, and prepare questions/comments before the meeting.
>>>>
>>>> Botong, Can you please create a Calcite JIRA case for this task? JIRA
>>>> how we track long-running tasks such as this.
>>>>
>>>> Julian
>>>>
>>>>
>>>> > On Apr 3, 2021, at 5:15 PM, Botong Huang <[email protected]> wrote:
>>>> >
>>>> > Hi all,
>>>> >
>>>> > Apology for the delay. It took us some time to clean up our code base
>>>> and
>>>> > publicly release it (which will be out soon) for a quick peek.
>>>> >
>>>> > We are ready to present our work. Let's schedule a time for a Zoom
>>>> > meeting and discuss how to integrate Tempura into Calcite.
>>>> >
>>>> > Since some of our team members are in China, we prefer the time slot
>>>> of
>>>> > 7:00pm-11:30pm PST any day. I've added our time preference in the
>>>> shared
>>>> > doc below.
>>>> >
>>>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>>>> >
>>>> > We encourage everyone to add their time preferences (during
>>>> 04/15-04/30) in
>>>> > this doc. In a week or so, we will try to settle a time that works for
>>>> > most.
>>>> >
>>>> > Thanks,
>>>> > Botong
>>>> >
>>>> > On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <[email protected]>
>>>> wrote:
>>>> >
>>>> >> Hi Julian and Rui,
>>>> >>
>>>> >> Sounds good to us. Please give us some time to prepare some slides
>>>> for the
>>>> >> meeting.
>>>> >>
>>>> >> I've created a doc below for discussion. Please feel free to add
>>>> more in
>>>> >> here:
>>>> >>
>>>> >>
>>>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>>>> >>
>>>> >> Thanks,
>>>> >> Botong
>>>> >>
>>>> >> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <[email protected]
>>>> >
>>>> >> wrote:
>>>> >>
>>>> >>> PS The “editable doc” that Rui refers to is also a good idea. I
>>>> think we
>>>> >>> should create it to continue discussion after the first meeting.
>>>> >>>
>>>> >>> Julian
>>>> >>>
>>>> >>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <[email protected]>
>>>> >>> wrote:
>>>> >>>>
>>>> >>>> I think good next steps would be a PR and a meeting. The PR will
>>>> allow
>>>> >>> us to read the code, but I think we should do the first round of
>>>> questions
>>>> >>> at the meeting.  The meeting could perhaps start with a
>>>> presentation of the
>>>> >>> paper (do you have some slides you are planning to present at VLDB,
>>>> >>> Botong?) and then move on to questions about the concepts, which
>>>> >>> alternatives were considered, and how the concepts map onto other
>>>> current
>>>> >>> and future concepts in calcite.
>>>> >>>>
>>>> >>>> I don’t think we should start “reviewing” the PR line-by-line at
>>>> this
>>>> >>> point. We need to understand the high-level concepts and design
>>>> choices. If
>>>> >>> we start reviewing the PR we will get lost in the details.
>>>> >>>>
>>>> >>>> I know that integrating a major change is hard; I doubt that we
>>>> will be
>>>> >>> able to integrate everything, but we can build understanding about
>>>> where
>>>> >>> calcite needs to go, and I hope integrate a good amount of code to
>>>> help us
>>>> >>> get there.
>>>> >>>>
>>>> >>>> As I said before, after the integration I would like people to be
>>>> able
>>>> >>> to experiment with it and use it in their production systems.  That
>>>> way, it
>>>> >>> will not be an experiment that withers, but a feature set
>>>> integrates with
>>>> >>> other calcite features and gets stronger over time.
>>>> >>>>
>>>> >>>> Julian
>>>> >>>>
>>>> >>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <[email protected]>
>>>> wrote:
>>>> >>>>>
>>>> >>>>> For me to participate in the discussion for the above questions,
>>>> I
>>>> >>> will
>>>> >>>>> need to read a lot more to know relevant context and likely ask
>>>> lots of
>>>> >>>>> questions :-).  A editable doc is probably good for questions and
>>>> back
>>>> >>> and
>>>> >>>>> forward discussion.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> -Rui
>>>> >>>>>
>>>> >>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <[email protected]
>>>> >
>>>> >>> wrote:
>>>> >>>>>>
>>>> >>>>>> I am also happy to help push this work into Calcite (review code
>>>> and
>>>> >>> doc,
>>>> >>>>>> etc.).
>>>> >>>>>>
>>>> >>>>>> While you can share your code so people can have more idea how
>>>> it is
>>>> >>>>>> implemented, I think it would be also nice to have a doc to
>>>> discuss
>>>> >>> open
>>>> >>>>>> questions above. Some points that I copy those to here:
>>>> >>>>>>
>>>> >>>>>> 1. Can this solution be compatible with existing solutions in
>>>> Calcite
>>>> >>>>>> Streaming, materialized view maintenance, and multi-query
>>>> optimization
>>>> >>>>>> (Sigma and Delta relational operators, lattice, and Spool
>>>> operator),
>>>> >>>>>> 2. Did you find that you needed two separate cost models - one
>>>> for
>>>> >>> “view
>>>> >>>>>> maintenance” and another for “user queries” - since the
>>>> objectives of
>>>> >>> each
>>>> >>>>>> activity are so different?
>>>> >>>>>> 3. whether this work will hasten the arrival of multi-objective
>>>> >>> parametric
>>>> >>>>>> query optimization [1] in Calcite.
>>>> >>>>>> 4. probably SQL shell support.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> [1]:
>>>> >>>>>>
>>>> >>>
>>>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> -Rui
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <[email protected]>
>>>> wrote:
>>>> >>>>>>>
>>>> >>>>>>> it would be very nice to see a POC of your work.
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
>>>> [email protected]>
>>>> >>> wrote:
>>>> >>>>>>>
>>>> >>>>>>>> Hi Julian,
>>>> >>>>>>>>
>>>> >>>>>>>> Just wondering if there are any updates? We are wondering if it
>>>> >>> would
>>>> >>>>>>> help
>>>> >>>>>>>> to post our code for a quick preview.
>>>> >>>>>>>>
>>>> >>>>>>>> Thanks,
>>>> >>>>>>>> Botong
>>>> >>>>>>>>
>>>> >>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <[email protected]
>>>> >
>>>> >>> wrote:
>>>> >>>>>>>>
>>>> >>>>>>>>> Hi Julian,
>>>> >>>>>>>>>
>>>> >>>>>>>>> Thanks for your interest! Sure let's figure out a plan that
>>>> best
>>>> >>>>>>> benefits
>>>> >>>>>>>>> the community. Here are some clarifications that hopefully
>>>> answer
>>>> >>> your
>>>> >>>>>>>>> questions.
>>>> >>>>>>>>>
>>>> >>>>>>>>> In our work (Tempura), users specify the set of time points to
>>>> >>>>>>> consider
>>>> >>>>>>>>> running and a cost function that expresses users' preference
>>>> over
>>>> >>>>>>> time,
>>>> >>>>>>>>> Tempura will generate the best incremental plan that
>>>> minimizes the
>>>> >>>>>>>> overall
>>>> >>>>>>>>> cost function.
>>>> >>>>>>>>>
>>>> >>>>>>>>> In this incremental plan, the sub-plans at different time
>>>> points
>>>> >>> can
>>>> >>>>>>> be
>>>> >>>>>>>>> different from each other, as opposed to identical plans in
>>>> all
>>>> >>> delta
>>>> >>>>>>>> runs
>>>> >>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of the Tempura
>>>> paper,
>>>> >>> we
>>>> >>>>>>> can
>>>> >>>>>>>>> mimic the current streaming implementation by specifying two
>>>> >>> (logical)
>>>> >>>>>>>> time
>>>> >>>>>>>>> points in Tempura, representing the initial run and later
>>>> delta
>>>> >>> runs
>>>> >>>>>>>>> respectively. In general, note that Tempura supports various
>>>> form
>>>> >>> of
>>>> >>>>>>>>> incremental computing, not only the small-delta append-only
>>>> data
>>>> >>>>>>> model in
>>>> >>>>>>>>> streaming systems. That's why we believe Tempura subsumes the
>>>> >>> current
>>>> >>>>>>>>> streaming support, as well as any IVM implementations.
>>>> >>>>>>>>>
>>>> >>>>>>>>> About the cost model, we did not come up with a seperate cost
>>>> >>> model,
>>>> >>>>>>> but
>>>> >>>>>>>>> rather extended the existing one. Similar to multi-objective
>>>> >>>>>>>> optimization,
>>>> >>>>>>>>> costs incurred at different time points are considered
>>>> different
>>>> >>>>>>>>> dimensions. Tempura lets users supply a function that
>>>> converts this
>>>> >>>>>>> cost
>>>> >>>>>>>>> vector into a final cost. So under this function, any two
>>>> >>> incremental
>>>> >>>>>>>> plans
>>>> >>>>>>>>> are still comparable and there is an overall optimum. I guess
>>>> we
>>>> >>> can
>>>> >>>>>>> go
>>>> >>>>>>>>> down the route of multi-objective parametric query
>>>> optimization
>>>> >>>>>>> instead
>>>> >>>>>>>> if
>>>> >>>>>>>>> there is a need.
>>>> >>>>>>>>>
>>>> >>>>>>>>> Next on materialized views and multi-query optimization,
>>>> since our
>>>> >>>>>>>>> multi-time-point plan naturally involves materializing
>>>> intermediate
>>>> >>>>>>>> results
>>>> >>>>>>>>> for later time points, we need to solve the problem of
>>>> choosing
>>>> >>>>>>>>> materializations and include the cost of saving and reusing
>>>> the
>>>> >>>>>>>>> materializations when costing and comparing plans. We
>>>> borrowed the
>>>> >>>>>>>>> multi-query optimization techniques to solve this problem even
>>>> >>> though
>>>> >>>>>>> we
>>>> >>>>>>>>> are looking at a single query. As a result, we think our work
>>>> is
>>>> >>>>>>>> orthogonal
>>>> >>>>>>>>> to Calcite's facilities around utilizing existing views,
>>>> lattice
>>>> >>> etc.
>>>> >>>>>>> We
>>>> >>>>>>>> do
>>>> >>>>>>>>> feel that the multi-query optimization component can be
>>>> adopted to
>>>> >>>>>>> wider
>>>> >>>>>>>>> use, but probably need more suggestions from the community.
>>>> >>>>>>>>>
>>>> >>>>>>>>> Lastly, our current implementation is set up in java code, it
>>>> >>> should
>>>> >>>>>>> be
>>>> >>>>>>>>> straightforward to hook it up with SQL shell.
>>>> >>>>>>>>>
>>>> >>>>>>>>> Thanks,
>>>> >>>>>>>>> Botong
>>>> >>>>>>>>>
>>>> >>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
>>>> >>> [email protected]>
>>>> >>>>>>>>> wrote:
>>>> >>>>>>>>>
>>>> >>>>>>>>>> Botong,
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> This is very exciting; congratulations on this research, and
>>>> thank
>>>> >>>>>>> you
>>>> >>>>>>>>>> for contributing it back to Calcite.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> The research touches several areas in Calcite: streaming,
>>>> >>>>>>> materialized
>>>> >>>>>>>>>> view maintenance, and multi-query optimization. As we have
>>>> already
>>>> >>>>>>> some
>>>> >>>>>>>>>> solutions in those areas (Sigma and Delta relational
>>>> operators,
>>>> >>>>>>> lattice,
>>>> >>>>>>>>>> and Spool operator), it will be interesting to see whether
>>>> we can
>>>> >>>>>>> make
>>>> >>>>>>>> them
>>>> >>>>>>>>>> compatible, or whether one concept can subsume others.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Your work differs from streaming queries in that your
>>>> relations
>>>> >>> are
>>>> >>>>>>> used
>>>> >>>>>>>>>> by “external” user queries, whereas in pure streaming
>>>> queries, the
>>>> >>>>>>> only
>>>> >>>>>>>>>> activity is the change propagation. Did you find that you
>>>> needed
>>>> >>> two
>>>> >>>>>>>>>> separate cost models - one for “view maintenance” and
>>>> another for
>>>> >>>>>>> “user
>>>> >>>>>>>>>> queries” - since the objectives of each activity are so
>>>> different?
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> I wonder whether this work will hasten the arrival of
>>>> >>> multi-objective
>>>> >>>>>>>>>> parametric query optimization [1] in Calcite.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> I will make time over the next few days to read and digest
>>>> your
>>>> >>>>>>> paper.
>>>> >>>>>>>>>> Then I expect that we will have a back-and-forth process to
>>>> create
>>>> >>>>>>>>>> something that will be useful for the broader community.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> One thing will be particularly useful: making this
>>>> functionality
>>>> >>>>>>>>>> available from a SQL shell, so that people can experiment
>>>> with
>>>> >>> this
>>>> >>>>>>>>>> functionality without writing Java code or setting up complex
>>>> >>>>>>> databases
>>>> >>>>>>>> and
>>>> >>>>>>>>>> metadata. I have in mind something like the simple DDL
>>>> operations
>>>> >>>>>>> that
>>>> >>>>>>>> are
>>>> >>>>>>>>>> available in Calcite’s ’server’ module. I wonder whether we
>>>> could
>>>> >>>>>>> devise
>>>> >>>>>>>>>> some kind of SQL syntax for a “multi-query”.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Julian
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> [1]
>>>> >>>>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>
>>>> >>>
>>>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <[email protected]
>>>> >
>>>> >>>>>>> wrote:
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Thanks Aron for pointing this out. To see the figure, please
>>>> >>> refer
>>>> >>>>>>> to
>>>> >>>>>>>>>> Fig
>>>> >>>>>>>>>>> 3(a) in our paper:
>>>> >>>>>>>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Best,
>>>> >>>>>>>>>>> Botong
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
>>>> [email protected]>
>>>> >>>>>>>> wrote:
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>> Seems interesting, the pic can not be seen in the mail,
>>>> may you
>>>> >>>>>>> open
>>>> >>>>>>>> a
>>>> >>>>>>>>>> JIRA
>>>> >>>>>>>>>>>> for this, people who are interested in this can subscribe
>>>> to the
>>>> >>>>>>>> JIRA?
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Regards!
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Aron Tao
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Botong Huang <[email protected]> 于2020年12月24日周四 上午3:18写道:
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Hi all,
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> This is a proposal to extend the Calcite optimizer into a
>>>> >>> general
>>>> >>>>>>>>>>>>> incremental query optimizer, based on our research paper
>>>> >>>>>>> published
>>>> >>>>>>>> in
>>>> >>>>>>>>>>>> VLDB
>>>> >>>>>>>>>>>>> 2021:
>>>> >>>>>>>>>>>>> Tempura: a general cost-based optimizer framework for
>>>> >>> incremental
>>>> >>>>>>>> data
>>>> >>>>>>>>>>>>> processing
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating how
>>>> Alibaba’s
>>>> >>>>>>> data
>>>> >>>>>>>>>>>>> warehouse is planning to use this incremental query
>>>> optimizer
>>>> >>> to
>>>> >>>>>>>>>>>> alleviate
>>>> >>>>>>>>>>>>> cluster-wise resource skewness:
>>>> >>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting Resource-Aware
>>>> >>> Incremental
>>>> >>>>>>>>>>>> Computing
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> To our best knowledge, this is the first general
>>>> cost-based
>>>> >>>>>>>>>> incremental
>>>> >>>>>>>>>>>>> optimizer that can find the best plan across multiple
>>>> families
>>>> >>> of
>>>> >>>>>>>>>>>>> incremental computing methods, including IVM, Streaming,
>>>> >>>>>>> DBToaster,
>>>> >>>>>>>>>> etc.
>>>> >>>>>>>>>>>>> Experiments (in the paper) shows that the generated best
>>>> plan
>>>> >>> is
>>>> >>>>>>>>>>>>> consistently much better than the plans from each
>>>> individual
>>>> >>>>>>> method
>>>> >>>>>>>>>>>> alone.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> In general, incremental query planning is central to
>>>> database
>>>> >>>>>>> view
>>>> >>>>>>>>>>>>> maintenance and stream processing systems, and are being
>>>> >>> adopted
>>>> >>>>>>> in
>>>> >>>>>>>>>>>> active
>>>> >>>>>>>>>>>>> databases, resumable query execution, approximate query
>>>> >>>>>>> processing,
>>>> >>>>>>>>>> etc.
>>>> >>>>>>>>>>>> We
>>>> >>>>>>>>>>>>> are hoping that this feature can help widening the
>>>> spectrum of
>>>> >>>>>>>>>> Calcite,
>>>> >>>>>>>>>>>>> solicit more use cases and adoption of Calcite.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Below is a brief description of the technical details.
>>>> Please
>>>> >>>>>>> refer
>>>> >>>>>>>> to
>>>> >>>>>>>>>>>> the
>>>> >>>>>>>>>>>>> Tempura paper for more details. We are also working on a
>>>> >>> journal
>>>> >>>>>>>>>> version
>>>> >>>>>>>>>>>> of
>>>> >>>>>>>>>>>>> the paper with more implementation details.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Currently the query plan generated by Calcite is meant to
>>>> be
>>>> >>>>>>>> executed
>>>> >>>>>>>>>>>>> altogether at once. In the proposal, Calcite’s memo will
>>>> be
>>>> >>>>>>> extended
>>>> >>>>>>>>>> with
>>>> >>>>>>>>>>>>> temporal information so that it is capable of generating
>>>> >>>>>>> incremental
>>>> >>>>>>>>>>>> plans
>>>> >>>>>>>>>>>>> that include multiple sub-plans to execute at different
>>>> time
>>>> >>>>>>> points.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> The main idea is to view each table as one that changes
>>>> over
>>>> >>> time
>>>> >>>>>>>>>> (Time
>>>> >>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we introduced
>>>> >>>>>>> TvrMetaSet
>>>> >>>>>>>>>> into
>>>> >>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to track
>>>> related
>>>> >>>>>>> RelSets
>>>> >>>>>>>>>> of a
>>>> >>>>>>>>>>>>> changing table (e.g. snapshot of the table at certain
>>>> time,
>>>> >>>>>>> delta of
>>>> >>>>>>>>>> the
>>>> >>>>>>>>>>>>> table between two time points, etc.).
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> [image: image.png]
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> For example in the above figure, each vertical line is a
>>>> >>>>>>> TvrMetaSet
>>>> >>>>>>>>>>>>> representing a TVR (S, R, S left outer join R, etc.).
>>>> >>> Horizontal
>>>> >>>>>>>> lines
>>>> >>>>>>>>>>>>> represent time. Each black dot in the grid is a RelSet.
>>>> Users
>>>> >>> can
>>>> >>>>>>>>>> write
>>>> >>>>>>>>>>>> TVR
>>>> >>>>>>>>>>>>> Rewrite Rules to describe valid transformations between
>>>> these
>>>> >>>>>>> dots.
>>>> >>>>>>>>>> For
>>>> >>>>>>>>>>>>> example, the blues lines are inter-TVR rules that
>>>> describe how
>>>> >>> to
>>>> >>>>>>>>>> compute
>>>> >>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other TVRs. The
>>>> red
>>>> >>> lines
>>>> >>>>>>>> are
>>>> >>>>>>>>>>>>> intra-TVR rules that describe transformations within a
>>>> TVR. All
>>>> >>>>>>> TVR
>>>> >>>>>>>>>>>> rewrite
>>>> >>>>>>>>>>>>> rules are logical rules. All existing Calcite rules still
>>>> work
>>>> >>> in
>>>> >>>>>>>> the
>>>> >>>>>>>>>> new
>>>> >>>>>>>>>>>>> volcano system without modification.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> All changes in this feature will consist of four parts:
>>>> >>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
>>>> >>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching TvrMetaSet and
>>>> >>>>>>> RelNodes,
>>>> >>>>>>>>>> as
>>>> >>>>>>>>>>>>> well as links in between the nodes.
>>>> >>>>>>>>>>>>> 3. A basic set of TvrRules, written using the upgraded
>>>> rule
>>>> >>>>>>> engine
>>>> >>>>>>>>>> API.
>>>> >>>>>>>>>>>>> 4. Multi-query optimization, used to find the best
>>>> incremental
>>>> >>>>>>> plan
>>>> >>>>>>>>>>>>> involving multiple time points.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Note that this feature is an extension in nature and thus
>>>> when
>>>> >>>>>>>>>> disabled,
>>>> >>>>>>>>>>>>> does not change any existing Calcite behavior.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Other than scenarios in the paper, we also applied this
>>>> >>>>>>>>>> Calcite-extended
>>>> >>>>>>>>>>>>> incremental query optimizer to a type of periodic query
>>>> called
>>>> >>>>>>> the
>>>> >>>>>>>>>>>> ‘‘range
>>>> >>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It achieved cost
>>>> savings
>>>> >>> of
>>>> >>>>>>> 80%
>>>> >>>>>>>>>> on
>>>> >>>>>>>>>>>>> total CPU and memory consumption, and 60% on end-to-end
>>>> >>> execution
>>>> >>>>>>>>>> time.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> All comments and suggestions are welcome. Thanks and happy
>>>> >>>>>>> holidays!
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>>> Botong
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> --
>>>> >>>>>>> ~~~~~~~~~~~~~~~
>>>> >>>>>>> no mistakes
>>>> >>>>>>> ~~~~~~~~~~~~~~~~~~
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>
>>>> >>
>>>>
>>>>

Reply via email to