Hi Stamatis, Thanks for the quick reply, this seems to be exactly what I was looking for! Is there any literature / articles you could recommend on this topic of optimizations and preprocessing? I've found reading through Calcite docs and source code to be a tad tedious, and wondering if there is a better way to learn more about this topic than reaching out to the mailing list whenever I come across something new and interesting.
Thanks, Logan On Fri, May 10, 2024 at 1:55 AM Stamatis Zampetakis <zabe...@gmail.com> wrote: > Hey Logan, > > Many parts of Calcite such as rules and metadata providers rely on the > assumption that the RelNode tree does not contain subqueries. This is > achieved by using the SubQueryRemoveRule [1] early on during the > optimization process. Another pretty common preprocessing step is > query decorrelation [2]. > > Best, > Stamatis > > [1] > https://github.com/apache/calcite/blob/f854ef5ee480e0ff893b18d27ec67dc381ee2244/core/src/main/java/org/apache/calcite/rel/rules/SubQueryRemoveRule.java > [2] > https://github.com/apache/calcite/blob/f854ef5ee480e0ff893b18d27ec67dc381ee2244/core/src/main/java/org/apache/calcite/sql2rel/RelDecorrelator.java#L147 > > On Fri, May 10, 2024 at 5:14 AM JinxTheKid <logansmith...@gmail.com> > wrote: > > > > Hi all, I am new to Calcite so apologies for what might be a basic > question. > > > > I'm working with the RelTree, and trying to understand the standard way > to > > collect column origins for a RelNode. My current strategy to collect > column > > origins for some arbitrary node is to use the node's metadataQuery field, > > in conjunction with some list of indices (maybe its projection indices, > > join indices, something of the like). > > > > This strategy works for simple queries, but fails when dealing with > > subqueries (in particular, I'm concerned with subqueries inside of > > projections). I have a variant of my strategy that uses RexVisitor to > visit > > all RexSubqueries I find inside of projections, but this strategy feels a > > bit verbose. Is there a builtin (or better) way to "deeply" collect > column > > origins besides performing this aforementioned strategy? > > > > Thanks, > > Logan >