Hey Askar, I am happy to inform that I could make a little more progress, and here is the gist - https://gist.github.com/Soumyadeep-github/47c1131bf02149af995d8e3bccb3ee67.
Although I am able to do this, I don't think I have reached my goal yet. A question that came to mind was this : - Is it not possible to add a node to a tree when the sql query has been parsed to RelNode after validation? - If I wish to apply a similar approach to a RelRule which I am trying to enforce, is that not expected? (an anti-pattern) - Is there a way to generate the RelBuilder version of the same Sql query? (SqlToRelConverter?) My approach in this gist : get a SqlNode (parsed query) ==> visit each SqlCall node recursively ==> find SqlSelect (since there's getHaving and setHaving) ==> create a new dummy query and extract having from the query ==> set Having for the current query at hand via the dummy query. Note : For no particular reason I have used only visit by SqlCall, was trying to build up on what you shared and the other visit methods didn't seem to fit. The approach I have in mind for a RelRule : use ConverterRule ==> find the node under onMatch as Aggregate ==> then use call.transformTo to add a having clause to the existing aggregate node if there's no having clause present using the RelBuilder, but that does not seem feasible. (below is what I was thinking, original query : SELECT o_custkey, COUNT(*) AS C FROM orders GROUP BY o_custkey) [image: Screenshot 2023-05-10 at 2.09.41 AM.png] Maybe I am missing something. Please feel free to share your comments. :) Thank you again! With regards, Soumyadeep Mukhopadhyay. On Thu, May 4, 2023 at 7:43 AM Soumyadeep Mukhopadhyay < [email protected]> wrote: > Hey Askar, > > Thank you so much for your email. Please take as much time as you need. I > will keep trying in the meantime. :) > > With regards, > Soumyadeep Mukhopadhyay. > > On Thu, 4 May 2023 at 3:04 AM, Askar Bozcan <[email protected]> wrote: > >> Hey, sorry for the late reply but I had some stuff to do IRL & some >> deadlines I need to meet right now. I'll try to answer you as soon as I >> can, but this week doesn't seem possible. >> >> Keep the questions coming, however, if you haven't solved them yourself >> (and if you did, please share your solutions). I'm planning to write a >> sort-of "Comprehensive Calcite 101" documentation in the near future which >> I am going to be basing on my own experiences & questions in the >> community. >> >> Kind regards, >> Askar >> >> On Mon, 24 Apr 2023 at 22:52, Soumyadeep Mukhopadhyay < >> [email protected]> >> wrote: >> >> > Hello Askar, >> > >> > Thank you so much for guiding me, I have taken some inspiration from >> your >> > code and from Dremio's implementation and compiled this - >> > >> https://gist.github.com/Soumyadeep-github/fdb963a4f1df2194eea3268903e3ac7b >> > . >> > >> > I have a few follow-up questions, if I may, regarding what I have >> > implemented: >> > >> > - is it possible to implement this as a rule, probably a RelRule (I am >> > saying a RelRule because as far as I have understood RelRule is the way >> you >> > implement rules that you would like to enforce before the physical plan >> is >> > created)? >> > *goal : **Find GROUP BY in a query, check if HAVING COUNT(*)>120 is >> > present, if not then add it in the query (probably transform it).* >> > *research done so far* : I probably need to create a RelRule and >> implement >> > the "matches" and "onMatch" methods. The "onMatch" method should >> transform >> > my rel or rex node using transformTo, probably. >> > Something like >> > >> > >> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.html >> > is >> > something which I might need to implement. >> > But a few methods are hard to grasp for me and they are: >> > >> > - transformTo >> > - RelRule.Config.withOperandFor() >> > >> > Could you please give me some insight on how I can understand their >> > structures? >> > >> > - is it possible to find whether a node (sql/rel) is GROUP BY or not? >> > - if there is an AND clause in WHERE, is there a way to parse all the >> > filter conditions separately? (using RexCall.getOperator did not work) >> > - is this (what I have done) only a hack or is there a better way to >> > implement this? >> > >> > I understand that my questions are probably not trivial, so I appreciate >> > that you are reading my emails and providing your valuable feedback. >> > >> > Thank you for your time and consideration. :) >> > >> > With regards, >> > Soumyadeep Mukhopadhyay. >> > >> > >> > >> > On Sun, Apr 16, 2023 at 11:34 PM Askar Bozcan <[email protected]> >> > wrote: >> > >> > > Thank you for your kind words :) >> > > >> > > - Askar >> > > >> > > On 15 Apr 2023 Sat at 21:39 Soumyadeep Mukhopadhyay < >> > [email protected] >> > > > >> > > wrote: >> > > >> > > > Hello Askar, >> > > > >> > > > Thank you so much for taking the time to compile this for me. >> > > > >> > > > I shall be indebted to you for this effort. Please let me know if I >> can >> > > be >> > > > of any assistance ever to repay your kindness. >> > > > >> > > > I appreciate all your help and guidance. Thank you once again. I >> shall >> > go >> > > > through the same and will get back to you as soon as possible. Thank >> > you >> > > > again sir! :) >> > > > >> > > > With best regards, >> > > > Soumyadeep Mukhopadhyay. >> > > > >> > > > >> > > > On Sat, 15 Apr 2023 at 11:52 PM, Askar Bozcan < >> [email protected]> >> > > > wrote: >> > > > >> > > > > Hello again! >> > > > > I have prepared this gist to perhaps help you understand how to >> > > traverse >> > > > > through a Sql parse tree: >> > > > > >> https://gist.github.com/askarbozcan/6ffc01b465550e171a95074308cab40f >> > > > > >> > > > > I have implemented a Filterer class that returns false on any >> queries >> > > > which >> > > > > have 'HAVING COUNT > x' where x is <= 100, and true for all other >> > > > queries. >> > > > > I have used Kotlin, so if you have difficulties understanding it >> (as >> > > > you're >> > > > > using Scala), check out Kotlin's smart cast and nullability. >> > > > > >> > > > > Good luck and hopefully that was helpful, >> > > > > Askar Bozcan >> > > > > >> > > > > On Wed, 12 Apr 2023 at 23:37, Soumyadeep Mukhopadhyay < >> > > > > [email protected]> >> > > > > wrote: >> > > > > >> > > > > > Hi Askar, >> > > > > > >> > > > > > I have gone through your email a couple of times. I am trying to >> > > > > > understand it bit by bit. >> > > > > > >> > > > > > I have a request, please feel free to say no, is there any code >> > base >> > > > > where >> > > > > > I can refer and understand how to implement my own filter rule? >> > > > > > What I realised I asked wrong is I wanted to implement "(HAVING >> > > > COUNT(*) >> > > > > > > 100)" as a global rule, if there are any aggregation queries. >> > > > > > >> > > > > > It is a bit difficult for me to understand where I should start. >> > > > > > For example, if I follow this >> > > > > https://github.com/zabetak/calcite-tutorial where >> > > > > > shall I start looking at? >> > > > > > In the below image I feel like some kind of comparison is going >> on, >> > > > > should >> > > > > > I devise my code like this? (in order to implement something >> like >> > > > > "(HAVING >> > > > > > COUNT(*) > 100)") >> > > > > > [image: Screenshot 2023-04-11 at 5.38.43 PM.png] >> > > > > > >> > > > > > Also is there any documentation I can go through regarding how I >> > can >> > > > > > traverse through the AST? >> > > > > > Probably I am not looking in the right places but so far I could >> > only >> > > > go >> > > > > > through Tab9 code examples (or the documentation provided by the >> > > > Calcite >> > > > > > website) and things did not seem to be clear. >> > > > > > I realise it is probably a lot to ask, so whatever you share >> will >> > be >> > > a >> > > > > lot >> > > > > > of help for me. >> > > > > > >> > > > > > Thanks again for your time, patience and help! >> > > > > > >> > > > > > With regards, >> > > > > > Soumyadeep Mukhopadhyay. >> > > > > > >> > > > > > On Fri, Apr 7, 2023 at 8:33 PM Askar Bozcan < >> [email protected] >> > > >> > > > > wrote: >> > > > > > >> > > > > >> Hey, >> > > > > >> You can use Planner.reset(). Note that it has mutable state >> > inside, >> > > so >> > > > > do >> > > > > >> not reuse the same planner instance in any multi-threading >> > > > environment. >> > > > > >> >> > > > > >> (I am assuming you have access to table metadata so that you >> will >> > be >> > > > > able >> > > > > >> to convert your SqlNode tree into RelNode tree, relational >> > > expression >> > > > > >> tree) >> > > > > >> - Only return results above a certain threshold when using >> GROUP >> > BY, >> > > > for >> > > > > >> example (HAVING COUNT(col1) > 100). >> > > > > >> >> > > > > >> I'm not quite sure I understand this question is HAVING part of >> > the >> > > > > query? >> > > > > >> >> > > > > >> - Restrict the column on which joins can happen, or else throw >> an >> > > > error >> > > > > >> (almost like analysisException in Spark) >> > > > > >> >> > > > > >> Do you have access to table/schema metadata? >> > > > > >> If you do: >> > > > > >> 1) Convert your parsed syntax tree (SqlNode) into a logical >> > > relational >> > > > > >> tree >> > > > > >> (RelNode). >> > > > > >> Watch this tutorial by Stamatis: >> > > > > >> https://www.youtube.com/watch?v=p1O3E33FIs8. >> > > > > >> It will explain way better the usage than I can in an email. >> (And >> > it >> > > > > what >> > > > > >> introduced me to Calcite's basics :)) >> > > > > >> 2) Traverse your relational tree by implementing RelShuttle >> > > > > >> < >> > > > > >> >> > > > > >> > > > >> > > >> > >> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/RelShuttle.html >> > > > > >> >. >> > > > > >> (Look up Visitor pattern to better understand how this >> recursive >> > > > > traversal >> > > > > >> works) >> > > > > >> 3) Ignore any RelNode's which are not LogicalJoin. >> > > > > >> 4) When you encounter LogicalJoin, traverse its children with >> > > getLeft >> > > > > and >> > > > > >> getRight >> > > > > >> 5) When you encounter LogicalTableScan, You can get its >> > RelOptTable >> > > > > with >> > > > > >> getTable, and then RelOptTable.getRowType() to find the fields. >> > > > > >> (Not 100% about step 5, but should be close to it) >> > > > > >> >> > > > > >> >> > > > > >> - Restrict the columns that can be passed in a select >> statement or >> > > > else >> > > > > >> throw an error (like in the point above) >> > > > > >> Same logic as above. But instead of TableScan, look at the root >> > > > > >> LogicalProject. >> > > > > >> >> > > > > >> All of the above should be doable with just the parse tree >> > (SqlNode) >> > > > and >> > > > > >> without access to table metadata using SqlShuttle; however, >> it's >> > > > easier >> > > > > >> and >> > > > > >> less error-prone with relational tree IMO. >> > > > > >> >> > > > > >> - Askar >> > > > > >> >> > > > > >> >> > > > > >> On Thu, 6 Apr 2023 at 23:31, Soumyadeep Mukhopadhyay < >> > > > > >> [email protected]> >> > > > > >> wrote: >> > > > > >> >> > > > > >> > Hey Askar, >> > > > > >> > >> > > > > >> > It worked exactly how you suggested. >> > > > > >> > >> > > > > >> > ``` >> > > > > >> > >> > > > > >> > private val sqlQuery: String = "SELECT list_of_columns FROM >> > > > table_name >> > > > > >> > WHERE predicate_column = 'predicate_value'" >> > > > > >> > private val frameworkConfig: FrameworkConfig = >> > > > > >> > Frameworks.newConfigBuilder.build() >> > > > > >> > private val planner: Planner = >> > > > Frameworks.getPlanner(frameworkConfig) >> > > > > >> > private val planner2: Planner = >> > > > Frameworks.getPlanner(frameworkConfig) >> > > > > >> > private val planner3: Planner = >> > > > Frameworks.getPlanner(frameworkConfig) >> > > > > >> > private val sqlNode: SqlNode = planner.parse(sqlQuery) >> > > > > >> > println(sqlNode.getKind) >> > > > > >> > private val sqlSelectStmt: SqlSelect = >> > > > sqlNode.asInstanceOf[SqlSelect] >> > > > > >> > >> > > > > >> > private val setSelectColumnsQuery = "SELECT age" >> > > > > >> > private val selectList = >> > > > > >> > >> > > > > >> >> > > > > >> > > > >> > > >> > >> planner2.parse(setSelectColumnsQuery).asInstanceOf[SqlSelect].getSelectList >> > > > > >> > private val setFromTableQuery = "SELECT employee" >> > > > > >> > private val fromTable = >> > > > > >> > >> > > > > >> > planner3.parse(setFromTableQuery).asInstanceOf[SqlSelect].getSelectList >> > > > > >> > >> > > > > >> > sqlSelectStmt.setSelectList(selectList) >> > > > > >> > sqlSelectStmt.setFrom(fromTable) >> > > > > >> > private val finalQuery = sqlSelectStmt.asInstanceOf[SqlNode] >> > > > > >> > println(finalQuery.toSqlString(SnowflakeSqlDialect.DEFAULT)) >> > > > > >> > >> > > > > >> > ``` >> > > > > >> > >> > > > > >> > The only caveat I see is the need for a new planner for every >> > new >> > > > > query. >> > > > > >> > Should I do something else or is this expected? >> > > > > >> > >> > > > > >> > On a different note, I wanted to ask about how I can write my >> > own >> > > > > >> rules. Is >> > > > > >> > it possible to enforce following rules: >> > > > > >> > - Only return results above a certain threshold when using >> GROUP >> > > BY, >> > > > > for >> > > > > >> > example (HAVING COUNT(col1) > 100). >> > > > > >> > - Restrict the column on which joins can happen, or else >> throw >> > an >> > > > > error >> > > > > >> > (almost like analysisException in Spark) >> > > > > >> > - Restrict the columns that can be passed in a select >> statement >> > or >> > > > > else >> > > > > >> > throw an error (like in the point above) >> > > > > >> > >> > > > > >> > Not sure if it is feasible, please feel free to suggest. :) >> > Thanks >> > > > > again >> > > > > >> > for your time! >> > > > > >> > >> > > > > >> > With regards, >> > > > > >> > Soumyadeep Mukhopadhyay. >> > > > > >> > >> > > > > >> > >> > > > > >> > On Sun, 2 Apr 2023 at 8:59 PM, Oscar Mulin < >> > [email protected] >> > > > >> > > > > >> wrote: >> > > > > >> > >> > > > > >> > > Hey Soumyadeep, >> > > > > >> > > I think that can work with a few caveats. >> > > > > >> > > 0) Use the Planner from Frameworks >> > > > > >> > > < >> > > > > >> > > >> > > > > >> > >> > > > > >> >> > > > > >> > > > >> > > >> > >> https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/Frameworks.html >> > > > > >> > > > >> > > > > >> > > 1) Parse the “template query" into a syntax tree (a root >> > > SqlNode) >> > > > > but >> > > > > >> do >> > > > > >> > > not validate it >> > > > > >> > > 2) Cast the root SqlNode into a SqlSelect >> > > > > >> > > < >> > > > > >> > > >> > > > > >> > >> > > > > >> >> > > > > >> > > > >> > > >> > >> https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSelect.html >> > > > > >> > > > >> > > > > >> > > (can >> > > > > >> > > verify that it's a SELECT by using getKind()) >> > > > > >> > > 3) Use the setters from SqlSelect to modify the children of >> > root >> > > > > >> > SqlSelect >> > > > > >> > > node. >> > > > > >> > > 4) Unparse the root query back into a string >> > > (SqlNode.unparse()). >> > > > > >> > > >> > > > > >> > > Now the tricky part is 3), as you have to set proper >> SqlNode >> > > types >> > > > > as >> > > > > >> > > children of the SELECT node for column names, table names >> and >> > > for >> > > > > >> WHERE >> > > > > >> > > predicate. I don't remember them properly, but what you >> can do >> > > is >> > > > > >> parse a >> > > > > >> > > proper query (again, don't validate it as you'll then need >> > table >> > > > > >> > metadata), >> > > > > >> > > and check the kinds (getKind()) of children SqlNode's and >> > > replace >> > > > > them >> > > > > >> > with >> > > > > >> > > your replacements using setters of root SqlSelect node. >> > > > > >> > > >> > > > > >> > > This should work, but I'm not 100% certain as I'm unable to >> > > check >> > > > > >> right >> > > > > >> > > now. >> > > > > >> > > >> > > > > >> > > PS: Take care about dialects. Dialects are used in >> unparsing >> > as >> > > a >> > > > > >> > > "configuration" of SqlWriter, and can unparse the syntax >> tree >> > > > > >> differently >> > > > > >> > > based on the dialect you have chosen. >> > > > > >> > > >> > > > > >> > > Kind regards, >> > > > > >> > > Askar Bozcan >> > > > > >> > > >> > > > > >> > > On 2 Apr 2023, at 14:56, Soumyadeep Mukhopadhyay < >> > > > > >> [email protected]> >> > > > > >> > > wrote: >> > > > > >> > > >> > > > > >> > > Hello All, >> > > > > >> > > >> > > > > >> > > I have just heard of Apache Calcite and was exploring the >> > > > > >> possibilities. >> > > > > >> > I >> > > > > >> > > wish to achieve the following, and wanted to check if my >> hunch >> > > is >> > > > > >> > correct: >> > > > > >> > > - Use a template to build SQL queries, like use jinja-sql >> or >> > > even >> > > > > >> pebble >> > > > > >> > > (interpret the SqlNode tree kind of structure from my >> template >> > > and >> > > > > >> then >> > > > > >> > add >> > > > > >> > > the necessary fields like table name and group by fields >> from >> > an >> > > > > >> input) >> > > > > >> > > >> > > > > >> > > So what I am expecting is "SELECT ? FROM ? WHERE ?" would >> be >> > > > inside >> > > > > a >> > > > > >> > query >> > > > > >> > > template (in Jinja-sql it may look like "SELECT >> > {{select_fields >> > > | >> > > > > >> > sqlsafe}} >> > > > > >> > > FROM {{table_name | sqlsafe}} WHERE {{where_clause | >> > sqlsafe}}" >> > > > and >> > > > > >> > values >> > > > > >> > > like 'select_fields' would be substituted at run-time from >> a >> > > > > >> processing >> > > > > >> > > engine like https://pypi.org/project/Jinja2/, but for >> Java) >> > and >> > > > the >> > > > > >> > output >> > > > > >> > > would be "SELECT col1 FROM table1 WHERE col1 IS NOT NULL" >> with >> > > > some >> > > > > >> > dialect >> > > > > >> > > (like Snowflake or Big-Query). >> > > > > >> > > >> > > > > >> > > Is this possible? Any recommendations or suggestions are >> > > welcome. >> > > > > >> Even if >> > > > > >> > > the approach feels wrong please let me know. :) >> > > > > >> > > Thank you for your time and consideration. >> > > > > >> > > >> > > > > >> > > With regards, >> > > > > >> > > Soumyadeep Mukhopadhyay. >> > > > > >> > > >> > > > > >> > >> > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> >
