Botong,
    It is very exciting. Thank you for your contribution to the 
community. I'm very interested in materialization and incremental computing. We 
know that there are many challenges in this area, such as choosing a good 
execution plan, materializing a good view, effectively hitting the target, 
materializing and identifying the view, and implementing optimization rules in 
specific fields. It can speed up query optimization. Thank you for your sharing 
and look forward to more discussion and feedback.


ZhaoHui Xu


------------------ ???????? ------------------
??????:                                                                         
                                               "dev"                            
                                                        <bot...@apache.org&gt;;
????????:&nbsp;2020??12??24??(??????) ????3:17
??????:&nbsp;"dev"<dev@calcite.apache.org&gt;;"zengkai.zk"<zengkai...@alibaba-inc.com&gt;;"zuozhiw"<zuoz...@gmail.com&gt;;

????:&nbsp;Proposal to extend Calcite into a incremental query optimizer



Hi all,

This is a proposal to extend the Calcite optimizer into a general incremental 
query optimizer, based on our research paper published in VLDB 2021:
Tempura: a general cost-based optimizer framework for incremental data 
processing

We also have a demo in SIGMOD 2020 illustrating how Alibaba??s data warehouse 
is planning to use this incremental query optimizer to alleviate cluster-wise 
resource skewness:
Grosbeak: A Data Warehouse Supporting Resource-Aware Incremental Computing

To our best knowledge, this is the first general cost-based incremental 
optimizer that can find the best plan across multiple families of incremental 
computing methods, including IVM, Streaming, DBToaster, etc. Experiments (in 
the paper) shows that the generated best plan is consistently much better than 
the plans from each individual method alone.

In general, incremental query planning is central to database view maintenance 
and stream processing systems, and are being adopted in active databases, 
resumable query execution, approximate query processing, etc. We are hoping 
that this feature can help widening the spectrum of Calcite, solicit more use 
cases and adoption of Calcite.

Below is a brief description of the technical details. Please refer to the 
Tempura paper for more details. We are also working on a journal version of the 
paper with more implementation details.

Currently the query plan generated by Calcite is meant to be executed 
altogether at once. In the proposal, Calcite??s memo will be extended with 
temporal information so that it is capable of generating incremental plans that 
include multiple sub-plans to execute at different time points.

The main idea is to view each table as one that changes over time (Time Varying 
Relations (TVR)). To achieve that we introduced TvrMetaSet into Calcite??s memo 
besides RelSet and RelSubset to track related RelSets of a changing table (e.g. 
snapshot of the table at certain time, delta of the table between two time 
points, etc.).



For example in the above figure, each vertical line is a TvrMetaSet 
representing a TVR (S, R, S left outer join R, etc.). Horizontal lines 
represent time. Each black dot in the grid is a RelSet. Users can write TVR 
Rewrite Rules to describe valid transformations between these dots. For 
example, the blues lines are inter-TVR rules that describe how to compute 
certain RelSet of a TVR from RelSets of other TVRs. The red lines are intra-TVR 
rules that describe transformations within a TVR. All TVR rewrite rules are 
logical rules. All existing Calcite rules still work in the new volcano system 
without modification. 

All changes in this feature will consist of four parts:
1. Memo extension with TvrMetaSet
2. Rule engine upgrade, capable of matching TvrMetaSet and RelNodes, as well as 
links in between the nodes.
3. A basic set of TvrRules, written using the upgraded rule engine API.
4. Multi-query optimization, used to find the best incremental plan involving 
multiple time points.

Note that this feature is an extension in nature and thus when disabled, does 
not change any existing Calcite behavior.

Other than scenarios in the paper, we also applied this Calcite-extended 
incremental query optimizer to a type of periodic query called the ????range 
query???? in Alibaba??s data warehouse. It achieved cost savings of 80% on 
total CPU and memory consumption, and 60% on end-to-end execution time. 

All comments and suggestions are welcome. Thanks and happy holidays!&nbsp;


Best,&nbsp;
Botong

Reply via email to