Hello Askar, Thank you so much for taking the time to compile this for me.
I shall be indebted to you for this effort. Please let me know if I can be of any assistance ever to repay your kindness. I appreciate all your help and guidance. Thank you once again. I shall go through the same and will get back to you as soon as possible. Thank you again sir! :) With best regards, Soumyadeep Mukhopadhyay. On Sat, 15 Apr 2023 at 11:52 PM, Askar Bozcan <[email protected]> wrote: > Hello again! > I have prepared this gist to perhaps help you understand how to traverse > through a Sql parse tree: > https://gist.github.com/askarbozcan/6ffc01b465550e171a95074308cab40f > > I have implemented a Filterer class that returns false on any queries which > have 'HAVING COUNT > x' where x is <= 100, and true for all other queries. > I have used Kotlin, so if you have difficulties understanding it (as you're > using Scala), check out Kotlin's smart cast and nullability. > > Good luck and hopefully that was helpful, > Askar Bozcan > > On Wed, 12 Apr 2023 at 23:37, Soumyadeep Mukhopadhyay < > [email protected]> > wrote: > > > Hi Askar, > > > > I have gone through your email a couple of times. I am trying to > > understand it bit by bit. > > > > I have a request, please feel free to say no, is there any code base > where > > I can refer and understand how to implement my own filter rule? > > What I realised I asked wrong is I wanted to implement "(HAVING COUNT(*) > > > 100)" as a global rule, if there are any aggregation queries. > > > > It is a bit difficult for me to understand where I should start. > > For example, if I follow this > https://github.com/zabetak/calcite-tutorial where > > shall I start looking at? > > In the below image I feel like some kind of comparison is going on, > should > > I devise my code like this? (in order to implement something like > "(HAVING > > COUNT(*) > 100)") > > [image: Screenshot 2023-04-11 at 5.38.43 PM.png] > > > > Also is there any documentation I can go through regarding how I can > > traverse through the AST? > > Probably I am not looking in the right places but so far I could only go > > through Tab9 code examples (or the documentation provided by the Calcite > > website) and things did not seem to be clear. > > I realise it is probably a lot to ask, so whatever you share will be a > lot > > of help for me. > > > > Thanks again for your time, patience and help! > > > > With regards, > > Soumyadeep Mukhopadhyay. > > > > On Fri, Apr 7, 2023 at 8:33 PM Askar Bozcan <[email protected]> > wrote: > > > >> Hey, > >> You can use Planner.reset(). Note that it has mutable state inside, so > do > >> not reuse the same planner instance in any multi-threading environment. > >> > >> (I am assuming you have access to table metadata so that you will be > able > >> to convert your SqlNode tree into RelNode tree, relational expression > >> tree) > >> - Only return results above a certain threshold when using GROUP BY, for > >> example (HAVING COUNT(col1) > 100). > >> > >> I'm not quite sure I understand this question is HAVING part of the > query? > >> > >> - Restrict the column on which joins can happen, or else throw an error > >> (almost like analysisException in Spark) > >> > >> Do you have access to table/schema metadata? > >> If you do: > >> 1) Convert your parsed syntax tree (SqlNode) into a logical relational > >> tree > >> (RelNode). > >> Watch this tutorial by Stamatis: > >> https://www.youtube.com/watch?v=p1O3E33FIs8. > >> It will explain way better the usage than I can in an email. (And it > what > >> introduced me to Calcite's basics :)) > >> 2) Traverse your relational tree by implementing RelShuttle > >> < > >> > https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/RelShuttle.html > >> >. > >> (Look up Visitor pattern to better understand how this recursive > traversal > >> works) > >> 3) Ignore any RelNode's which are not LogicalJoin. > >> 4) When you encounter LogicalJoin, traverse its children with getLeft > and > >> getRight > >> 5) When you encounter LogicalTableScan, You can get its RelOptTable > with > >> getTable, and then RelOptTable.getRowType() to find the fields. > >> (Not 100% about step 5, but should be close to it) > >> > >> > >> - Restrict the columns that can be passed in a select statement or else > >> throw an error (like in the point above) > >> Same logic as above. But instead of TableScan, look at the root > >> LogicalProject. > >> > >> All of the above should be doable with just the parse tree (SqlNode) and > >> without access to table metadata using SqlShuttle; however, it's easier > >> and > >> less error-prone with relational tree IMO. > >> > >> - Askar > >> > >> > >> On Thu, 6 Apr 2023 at 23:31, Soumyadeep Mukhopadhyay < > >> [email protected]> > >> wrote: > >> > >> > Hey Askar, > >> > > >> > It worked exactly how you suggested. > >> > > >> > ``` > >> > > >> > private val sqlQuery: String = "SELECT list_of_columns FROM table_name > >> > WHERE predicate_column = 'predicate_value'" > >> > private val frameworkConfig: FrameworkConfig = > >> > Frameworks.newConfigBuilder.build() > >> > private val planner: Planner = Frameworks.getPlanner(frameworkConfig) > >> > private val planner2: Planner = Frameworks.getPlanner(frameworkConfig) > >> > private val planner3: Planner = Frameworks.getPlanner(frameworkConfig) > >> > private val sqlNode: SqlNode = planner.parse(sqlQuery) > >> > println(sqlNode.getKind) > >> > private val sqlSelectStmt: SqlSelect = sqlNode.asInstanceOf[SqlSelect] > >> > > >> > private val setSelectColumnsQuery = "SELECT age" > >> > private val selectList = > >> > > >> > planner2.parse(setSelectColumnsQuery).asInstanceOf[SqlSelect].getSelectList > >> > private val setFromTableQuery = "SELECT employee" > >> > private val fromTable = > >> > > planner3.parse(setFromTableQuery).asInstanceOf[SqlSelect].getSelectList > >> > > >> > sqlSelectStmt.setSelectList(selectList) > >> > sqlSelectStmt.setFrom(fromTable) > >> > private val finalQuery = sqlSelectStmt.asInstanceOf[SqlNode] > >> > println(finalQuery.toSqlString(SnowflakeSqlDialect.DEFAULT)) > >> > > >> > ``` > >> > > >> > The only caveat I see is the need for a new planner for every new > query. > >> > Should I do something else or is this expected? > >> > > >> > On a different note, I wanted to ask about how I can write my own > >> rules. Is > >> > it possible to enforce following rules: > >> > - Only return results above a certain threshold when using GROUP BY, > for > >> > example (HAVING COUNT(col1) > 100). > >> > - Restrict the column on which joins can happen, or else throw an > error > >> > (almost like analysisException in Spark) > >> > - Restrict the columns that can be passed in a select statement or > else > >> > throw an error (like in the point above) > >> > > >> > Not sure if it is feasible, please feel free to suggest. :) Thanks > again > >> > for your time! > >> > > >> > With regards, > >> > Soumyadeep Mukhopadhyay. > >> > > >> > > >> > On Sun, 2 Apr 2023 at 8:59 PM, Oscar Mulin <[email protected]> > >> wrote: > >> > > >> > > Hey Soumyadeep, > >> > > I think that can work with a few caveats. > >> > > 0) Use the Planner from Frameworks > >> > > < > >> > > > >> > > >> > https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/Frameworks.html > >> > > > > >> > > 1) Parse the “template query" into a syntax tree (a root SqlNode) > but > >> do > >> > > not validate it > >> > > 2) Cast the root SqlNode into a SqlSelect > >> > > < > >> > > > >> > > >> > https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSelect.html > >> > > > > >> > > (can > >> > > verify that it's a SELECT by using getKind()) > >> > > 3) Use the setters from SqlSelect to modify the children of root > >> > SqlSelect > >> > > node. > >> > > 4) Unparse the root query back into a string (SqlNode.unparse()). > >> > > > >> > > Now the tricky part is 3), as you have to set proper SqlNode types > as > >> > > children of the SELECT node for column names, table names and for > >> WHERE > >> > > predicate. I don't remember them properly, but what you can do is > >> parse a > >> > > proper query (again, don't validate it as you'll then need table > >> > metadata), > >> > > and check the kinds (getKind()) of children SqlNode's and replace > them > >> > with > >> > > your replacements using setters of root SqlSelect node. > >> > > > >> > > This should work, but I'm not 100% certain as I'm unable to check > >> right > >> > > now. > >> > > > >> > > PS: Take care about dialects. Dialects are used in unparsing as a > >> > > "configuration" of SqlWriter, and can unparse the syntax tree > >> differently > >> > > based on the dialect you have chosen. > >> > > > >> > > Kind regards, > >> > > Askar Bozcan > >> > > > >> > > On 2 Apr 2023, at 14:56, Soumyadeep Mukhopadhyay < > >> [email protected]> > >> > > wrote: > >> > > > >> > > Hello All, > >> > > > >> > > I have just heard of Apache Calcite and was exploring the > >> possibilities. > >> > I > >> > > wish to achieve the following, and wanted to check if my hunch is > >> > correct: > >> > > - Use a template to build SQL queries, like use jinja-sql or even > >> pebble > >> > > (interpret the SqlNode tree kind of structure from my template and > >> then > >> > add > >> > > the necessary fields like table name and group by fields from an > >> input) > >> > > > >> > > So what I am expecting is "SELECT ? FROM ? WHERE ?" would be inside > a > >> > query > >> > > template (in Jinja-sql it may look like "SELECT {{select_fields | > >> > sqlsafe}} > >> > > FROM {{table_name | sqlsafe}} WHERE {{where_clause | sqlsafe}}" and > >> > values > >> > > like 'select_fields' would be substituted at run-time from a > >> processing > >> > > engine like https://pypi.org/project/Jinja2/, but for Java) and the > >> > output > >> > > would be "SELECT col1 FROM table1 WHERE col1 IS NOT NULL" with some > >> > dialect > >> > > (like Snowflake or Big-Query). > >> > > > >> > > Is this possible? Any recommendations or suggestions are welcome. > >> Even if > >> > > the approach feels wrong please let me know. :) > >> > > Thank you for your time and consideration. > >> > > > >> > > With regards, > >> > > Soumyadeep Mukhopadhyay. > >> > > > >> > > >> > > >
