Botong, This is very exciting; congratulations on this research, and thank you for contributing it back to Calcite.
The research touches several areas in Calcite: streaming, materialized view maintenance, and multi-query optimization. As we have already some solutions in those areas (Sigma and Delta relational operators, lattice, and Spool operator), it will be interesting to see whether we can make them compatible, or whether one concept can subsume others. Your work differs from streaming queries in that your relations are used by “external” user queries, whereas in pure streaming queries, the only activity is the change propagation. Did you find that you needed two separate cost models - one for “view maintenance” and another for “user queries” - since the objectives of each activity are so different? I wonder whether this work will hasten the arrival of multi-objective parametric query optimization [1] in Calcite. I will make time over the next few days to read and digest your paper. Then I expect that we will have a back-and-forth process to create something that will be useful for the broader community. One thing will be particularly useful: making this functionality available from a SQL shell, so that people can experiment with this functionality without writing Java code or setting up complex databases and metadata. I have in mind something like the simple DDL operations that are available in Calcite’s ’server’ module. I wonder whether we could devise some kind of SQL syntax for a “multi-query”. Julian [1] https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext > On Dec 23, 2020, at 8:55 PM, Botong Huang <[email protected]> wrote: > > Thanks Aron for pointing this out. To see the figure, please refer to Fig > 3(a) in our paper: https://kai-zeng.github.io/papers/tempura-vldb2021.pdf > > Best, > Botong > > On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <[email protected]> wrote: > >> Seems interesting, the pic can not be seen in the mail, may you open a JIRA >> for this, people who are interested in this can subscribe to the JIRA? >> >> >> Regards! >> >> Aron Tao >> >> >> Botong Huang <[email protected]> 于2020年12月24日周四 上午3:18写道: >> >>> Hi all, >>> >>> This is a proposal to extend the Calcite optimizer into a general >>> incremental query optimizer, based on our research paper published in >> VLDB >>> 2021: >>> Tempura: a general cost-based optimizer framework for incremental data >>> processing >>> >>> We also have a demo in SIGMOD 2020 illustrating how Alibaba’s data >>> warehouse is planning to use this incremental query optimizer to >> alleviate >>> cluster-wise resource skewness: >>> Grosbeak: A Data Warehouse Supporting Resource-Aware Incremental >> Computing >>> >>> To our best knowledge, this is the first general cost-based incremental >>> optimizer that can find the best plan across multiple families of >>> incremental computing methods, including IVM, Streaming, DBToaster, etc. >>> Experiments (in the paper) shows that the generated best plan is >>> consistently much better than the plans from each individual method >> alone. >>> >>> In general, incremental query planning is central to database view >>> maintenance and stream processing systems, and are being adopted in >> active >>> databases, resumable query execution, approximate query processing, etc. >> We >>> are hoping that this feature can help widening the spectrum of Calcite, >>> solicit more use cases and adoption of Calcite. >>> >>> Below is a brief description of the technical details. Please refer to >> the >>> Tempura paper for more details. We are also working on a journal version >> of >>> the paper with more implementation details. >>> >>> Currently the query plan generated by Calcite is meant to be executed >>> altogether at once. In the proposal, Calcite’s memo will be extended with >>> temporal information so that it is capable of generating incremental >> plans >>> that include multiple sub-plans to execute at different time points. >>> >>> The main idea is to view each table as one that changes over time (Time >>> Varying Relations (TVR)). To achieve that we introduced TvrMetaSet into >>> Calcite’s memo besides RelSet and RelSubset to track related RelSets of a >>> changing table (e.g. snapshot of the table at certain time, delta of the >>> table between two time points, etc.). >>> >>> [image: image.png] >>> >>> For example in the above figure, each vertical line is a TvrMetaSet >>> representing a TVR (S, R, S left outer join R, etc.). Horizontal lines >>> represent time. Each black dot in the grid is a RelSet. Users can write >> TVR >>> Rewrite Rules to describe valid transformations between these dots. For >>> example, the blues lines are inter-TVR rules that describe how to compute >>> certain RelSet of a TVR from RelSets of other TVRs. The red lines are >>> intra-TVR rules that describe transformations within a TVR. All TVR >> rewrite >>> rules are logical rules. All existing Calcite rules still work in the new >>> volcano system without modification. >>> >>> All changes in this feature will consist of four parts: >>> 1. Memo extension with TvrMetaSet >>> 2. Rule engine upgrade, capable of matching TvrMetaSet and RelNodes, as >>> well as links in between the nodes. >>> 3. A basic set of TvrRules, written using the upgraded rule engine API. >>> 4. Multi-query optimization, used to find the best incremental plan >>> involving multiple time points. >>> >>> Note that this feature is an extension in nature and thus when disabled, >>> does not change any existing Calcite behavior. >>> >>> Other than scenarios in the paper, we also applied this Calcite-extended >>> incremental query optimizer to a type of periodic query called the >> ‘‘range >>> query’’ in Alibaba’s data warehouse. It achieved cost savings of 80% on >>> total CPU and memory consumption, and 60% on end-to-end execution time. >>> >>> All comments and suggestions are welcome. Thanks and happy holidays! >>> >>> Best, >>> Botong >>> >>
