Seems interesting, the pic can not be seen in the mail, may you open a JIRA for this, people who are interested in this can subscribe to the JIRA?
Regards! Aron Tao Botong Huang <[email protected]> 于2020年12月24日周四 上午3:18写道: > Hi all, > > This is a proposal to extend the Calcite optimizer into a general > incremental query optimizer, based on our research paper published in VLDB > 2021: > Tempura: a general cost-based optimizer framework for incremental data > processing > > We also have a demo in SIGMOD 2020 illustrating how Alibaba’s data > warehouse is planning to use this incremental query optimizer to alleviate > cluster-wise resource skewness: > Grosbeak: A Data Warehouse Supporting Resource-Aware Incremental Computing > > To our best knowledge, this is the first general cost-based incremental > optimizer that can find the best plan across multiple families of > incremental computing methods, including IVM, Streaming, DBToaster, etc. > Experiments (in the paper) shows that the generated best plan is > consistently much better than the plans from each individual method alone. > > In general, incremental query planning is central to database view > maintenance and stream processing systems, and are being adopted in active > databases, resumable query execution, approximate query processing, etc. We > are hoping that this feature can help widening the spectrum of Calcite, > solicit more use cases and adoption of Calcite. > > Below is a brief description of the technical details. Please refer to the > Tempura paper for more details. We are also working on a journal version of > the paper with more implementation details. > > Currently the query plan generated by Calcite is meant to be executed > altogether at once. In the proposal, Calcite’s memo will be extended with > temporal information so that it is capable of generating incremental plans > that include multiple sub-plans to execute at different time points. > > The main idea is to view each table as one that changes over time (Time > Varying Relations (TVR)). To achieve that we introduced TvrMetaSet into > Calcite’s memo besides RelSet and RelSubset to track related RelSets of a > changing table (e.g. snapshot of the table at certain time, delta of the > table between two time points, etc.). > > [image: image.png] > > For example in the above figure, each vertical line is a TvrMetaSet > representing a TVR (S, R, S left outer join R, etc.). Horizontal lines > represent time. Each black dot in the grid is a RelSet. Users can write TVR > Rewrite Rules to describe valid transformations between these dots. For > example, the blues lines are inter-TVR rules that describe how to compute > certain RelSet of a TVR from RelSets of other TVRs. The red lines are > intra-TVR rules that describe transformations within a TVR. All TVR rewrite > rules are logical rules. All existing Calcite rules still work in the new > volcano system without modification. > > All changes in this feature will consist of four parts: > 1. Memo extension with TvrMetaSet > 2. Rule engine upgrade, capable of matching TvrMetaSet and RelNodes, as > well as links in between the nodes. > 3. A basic set of TvrRules, written using the upgraded rule engine API. > 4. Multi-query optimization, used to find the best incremental plan > involving multiple time points. > > Note that this feature is an extension in nature and thus when disabled, > does not change any existing Calcite behavior. > > Other than scenarios in the paper, we also applied this Calcite-extended > incremental query optimizer to a type of periodic query called the ‘‘range > query’’ in Alibaba’s data warehouse. It achieved cost savings of 80% on > total CPU and memory consumption, and 60% on end-to-end execution time. > > All comments and suggestions are welcome. Thanks and happy holidays! > > Best, > Botong >
