Re: Proposal to extend Calcite into a incremental query optimizer

JiaTao Tao Wed, 23 Dec 2020 19:20:58 -0800

Seems interesting, the pic can not be seen in the mail, may you open a JIRA
for this, people who are interested in this can subscribe to the JIRA?



Regards!

Aron Tao


Botong Huang <[email protected]> 于2020年12月24日周四 上午3:18写道：

> Hi all,
>
> This is a proposal to extend the Calcite optimizer into a general
> incremental query optimizer, based on our research paper published in VLDB
> 2021:
> Tempura: a general cost-based optimizer framework for incremental data
> processing
>
> We also have a demo in SIGMOD 2020 illustrating how Alibaba’s data
> warehouse is planning to use this incremental query optimizer to alleviate
> cluster-wise resource skewness:
> Grosbeak: A Data Warehouse Supporting Resource-Aware Incremental Computing
>
> To our best knowledge, this is the first general cost-based incremental
> optimizer that can find the best plan across multiple families of
> incremental computing methods, including IVM, Streaming, DBToaster, etc.
> Experiments (in the paper) shows that the generated best plan is
> consistently much better than the plans from each individual method alone.
>
> In general, incremental query planning is central to database view
> maintenance and stream processing systems, and are being adopted in active
> databases, resumable query execution, approximate query processing, etc. We
> are hoping that this feature can help widening the spectrum of Calcite,
> solicit more use cases and adoption of Calcite.
>
> Below is a brief description of the technical details. Please refer to the
> Tempura paper for more details. We are also working on a journal version of
> the paper with more implementation details.
>
> Currently the query plan generated by Calcite is meant to be executed
> altogether at once. In the proposal, Calcite’s memo will be extended with
> temporal information so that it is capable of generating incremental plans
> that include multiple sub-plans to execute at different time points.
>
> The main idea is to view each table as one that changes over time (Time
> Varying Relations (TVR)). To achieve that we introduced TvrMetaSet into
> Calcite’s memo besides RelSet and RelSubset to track related RelSets of a
> changing table (e.g. snapshot of the table at certain time, delta of the
> table between two time points, etc.).
>
> [image: image.png]
>
> For example in the above figure, each vertical line is a TvrMetaSet
> representing a TVR (S, R, S left outer join R, etc.). Horizontal lines
> represent time. Each black dot in the grid is a RelSet. Users can write TVR
> Rewrite Rules to describe valid transformations between these dots. For
> example, the blues lines are inter-TVR rules that describe how to compute
> certain RelSet of a TVR from RelSets of other TVRs. The red lines are
> intra-TVR rules that describe transformations within a TVR. All TVR rewrite
> rules are logical rules. All existing Calcite rules still work in the new
> volcano system without modification.
>
> All changes in this feature will consist of four parts:
> 1. Memo extension with TvrMetaSet
> 2. Rule engine upgrade, capable of matching TvrMetaSet and RelNodes, as
> well as links in between the nodes.
> 3. A basic set of TvrRules, written using the upgraded rule engine API.
> 4. Multi-query optimization, used to find the best incremental plan
> involving multiple time points.
>
> Note that this feature is an extension in nature and thus when disabled,
> does not change any existing Calcite behavior.
>
> Other than scenarios in the paper, we also applied this Calcite-extended
> incremental query optimizer to a type of periodic query called the ‘‘range
> query’’ in Alibaba’s data warehouse. It achieved cost savings of 80% on
> total CPU and memory consumption, and 60% on end-to-end execution time.
>
> All comments and suggestions are welcome. Thanks and happy holidays!
>
> Best,
> Botong
>

Re: Proposal to extend Calcite into a incremental query optimizer

Reply via email to