Dear Members of Calcite Community, I'm working on Apache Beam SQL and we use Calcite for query optimization. We represent both tables and streams as a subclass of AbstractQueryableTable. In calcite implementation of cost model and statistics, one of the key elements is row count. Also all the nodes can present a rowcount estimation based on their inputs. For instance, in the case of joins, the rowcount is estimated by: left.rowCount*right.rowCount*selectivity_estimate.
My first question is, what is the correct way of representing streams in Calcite's Optimizer? Does calcite still uses row_count for streams? If so, what does row count represent in case of streams? In [1] they are suggesting to use both window (size of window in terms of tuples) and rate to represent output of all nodes in stream processing systems, and for every node these two values are estimated. For instance, they suggest to estimate window and rate of the joins using: join_rate = (left_rate*right_window + right_rate*left_window)*selectivitiy join_window = (left_window*right_window)*selectivitiy We were thinking about using this approach for Beam SQL; however, I am wondering where would be the point of extension? I was thinking to implement computeSelfCost() using a different cost model (rate, window_size) for our physical Rel Nodes, in which we don't call estimate_row_count and instead we use inputs' non cumulative cost to estimate the node's cost. However, I am not sure if this is a good approach and whether this can potentially cause problems in the optimization (because there will still be logical nodes that are implemented in calcite and may use row count estimation). Does calcite uses cost estimation for logical nodes such as logical join? or it only calculates the cost when the nodes are physical? I will appreciate if someone can help me. I will also appreciate if someone has other suggestions for streams query optimization. Best, Alireza Samadian [1] Ayad, Ahmed M., and Jeffrey F. Naughton. "Static optimization of conjunctive queries with sliding windows over infinite streams." *Proceedings of the 2004 ACM SIGMOD international conference on Management of data*. ACM, 2004.
