Hello again!
I have prepared this gist to perhaps help you understand how to traverse
through a Sql parse tree:
https://gist.github.com/askarbozcan/6ffc01b465550e171a95074308cab40f

I have implemented a Filterer class that returns false on any queries which
have 'HAVING COUNT > x' where x is <= 100, and true for all other queries.
I have used Kotlin, so if you have difficulties understanding it (as you're
using Scala), check out Kotlin's smart cast and nullability.

Good luck and hopefully that was helpful,
Askar Bozcan

On Wed, 12 Apr 2023 at 23:37, Soumyadeep Mukhopadhyay <[email protected]>
wrote:

> Hi Askar,
>
> I have gone through your email a couple of times. I am trying to
> understand it bit by bit.
>
> I have a request, please feel free to say no, is there any code base where
> I can refer and understand how to implement my own filter rule?
> What I realised I asked wrong is I wanted to implement "(HAVING COUNT(*)
> > 100)" as a global rule, if there are any aggregation queries.
>
> It is a bit difficult for me to understand where I should start.
> For example, if I follow this https://github.com/zabetak/calcite-tutorial 
> where
> shall I start looking at?
> In the below image I feel like some kind of comparison is going on, should
> I devise my code like this? (in order to implement something like "(HAVING
> COUNT(*) > 100)")
> [image: Screenshot 2023-04-11 at 5.38.43 PM.png]
>
> Also is there any documentation I can go through regarding how I can
> traverse through the AST?
> Probably I am not looking in the right places but so far I could only go
> through Tab9 code examples (or the documentation provided by the Calcite
> website) and things did not seem to be clear.
> I realise it is probably a lot to ask, so whatever you share will be a lot
> of help for me.
>
> Thanks again for your time, patience and help!
>
> With regards,
> Soumyadeep Mukhopadhyay.
>
> On Fri, Apr 7, 2023 at 8:33 PM Askar Bozcan <[email protected]> wrote:
>
>> Hey,
>> You can use Planner.reset(). Note that it has mutable state inside, so do
>> not reuse the same planner instance in any multi-threading environment.
>>
>> (I am assuming you have access to table metadata so that you will be able
>> to convert your SqlNode tree into RelNode tree, relational expression
>> tree)
>> - Only return results above a certain threshold when using GROUP BY, for
>> example (HAVING COUNT(col1) > 100).
>>
>> I'm not quite sure I understand this question is HAVING part of the query?
>>
>> - Restrict the column on which joins can happen, or else throw an error
>> (almost like analysisException in Spark)
>>
>> Do you have access to table/schema metadata?
>> If you do:
>> 1) Convert your parsed syntax tree (SqlNode) into a logical relational
>> tree
>> (RelNode).
>> Watch this tutorial by Stamatis:
>> https://www.youtube.com/watch?v=p1O3E33FIs8.
>> It will explain way better the usage than I can in an email. (And it what
>> introduced me to Calcite's basics :))
>> 2) Traverse your relational tree by implementing RelShuttle
>> <
>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/RelShuttle.html
>> >.
>> (Look up Visitor pattern to better understand how this recursive traversal
>> works)
>> 3) Ignore any RelNode's which are not LogicalJoin.
>> 4) When you encounter LogicalJoin, traverse its children with getLeft and
>> getRight
>> 5) When you encounter LogicalTableScan,  You can get its RelOptTable with
>> getTable, and then RelOptTable.getRowType() to find the fields.
>> (Not 100% about step 5, but should be close to it)
>>
>>
>> - Restrict the columns that can be passed in a select statement or else
>> throw an error (like in the point above)
>> Same logic as above. But instead of TableScan, look at the root
>> LogicalProject.
>>
>> All of the above should be doable with just the parse tree (SqlNode) and
>> without access to table metadata using SqlShuttle; however, it's easier
>> and
>> less error-prone with relational tree IMO.
>>
>> - Askar
>>
>>
>> On Thu, 6 Apr 2023 at 23:31, Soumyadeep Mukhopadhyay <
>> [email protected]>
>> wrote:
>>
>> > Hey Askar,
>> >
>> > It worked exactly how you suggested.
>> >
>> > ```
>> >
>> > private val sqlQuery: String = "SELECT list_of_columns FROM table_name
>> > WHERE predicate_column = 'predicate_value'"
>> > private val frameworkConfig: FrameworkConfig =
>> > Frameworks.newConfigBuilder.build()
>> > private val planner: Planner = Frameworks.getPlanner(frameworkConfig)
>> > private val planner2: Planner = Frameworks.getPlanner(frameworkConfig)
>> > private val planner3: Planner = Frameworks.getPlanner(frameworkConfig)
>> > private val sqlNode: SqlNode = planner.parse(sqlQuery)
>> > println(sqlNode.getKind)
>> > private val sqlSelectStmt: SqlSelect = sqlNode.asInstanceOf[SqlSelect]
>> >
>> > private val setSelectColumnsQuery = "SELECT age"
>> > private val selectList =
>> >
>> planner2.parse(setSelectColumnsQuery).asInstanceOf[SqlSelect].getSelectList
>> > private val setFromTableQuery = "SELECT employee"
>> > private val fromTable =
>> > planner3.parse(setFromTableQuery).asInstanceOf[SqlSelect].getSelectList
>> >
>> > sqlSelectStmt.setSelectList(selectList)
>> > sqlSelectStmt.setFrom(fromTable)
>> > private val finalQuery = sqlSelectStmt.asInstanceOf[SqlNode]
>> > println(finalQuery.toSqlString(SnowflakeSqlDialect.DEFAULT))
>> >
>> > ```
>> >
>> > The only caveat I see is the need for a new planner for every new query.
>> > Should I do something else or is this expected?
>> >
>> > On a different note, I wanted to ask about how I can write my own
>> rules. Is
>> > it possible to enforce following rules:
>> > - Only return results above a certain threshold when using GROUP BY, for
>> > example (HAVING COUNT(col1) > 100).
>> > - Restrict the column on which joins can happen, or else throw an error
>> > (almost like analysisException in Spark)
>> > - Restrict the columns that can be passed in a select statement or else
>> > throw an error (like in the point above)
>> >
>> > Not sure if it is feasible, please feel free to suggest. :) Thanks again
>> > for your time!
>> >
>> > With regards,
>> > Soumyadeep Mukhopadhyay.
>> >
>> >
>> > On Sun, 2 Apr 2023 at 8:59 PM, Oscar Mulin <[email protected]>
>> wrote:
>> >
>> > > Hey Soumyadeep,
>> > > I think that can work with a few caveats.
>> > > 0) Use the Planner from Frameworks
>> > > <
>> > >
>> >
>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/Frameworks.html
>> > > >
>> > > 1) Parse the “template query" into a syntax tree (a root SqlNode) but
>> do
>> > > not validate it
>> > > 2) Cast the root SqlNode into a SqlSelect
>> > > <
>> > >
>> >
>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSelect.html
>> > > >
>> > > (can
>> > > verify that it's a SELECT by using getKind())
>> > > 3) Use the setters from SqlSelect to modify the children of root
>> > SqlSelect
>> > > node.
>> > > 4) Unparse the root query back into a string (SqlNode.unparse()).
>> > >
>> > > Now the tricky part is 3), as you have to set proper SqlNode types as
>> > > children of the SELECT node for column names, table names and for
>> WHERE
>> > > predicate. I don't remember them properly, but what you can do is
>> parse a
>> > > proper query (again, don't validate it as you'll then need table
>> > metadata),
>> > > and check the kinds (getKind()) of children SqlNode's and replace them
>> > with
>> > > your replacements using setters of root SqlSelect node.
>> > >
>> > > This should work, but I'm not 100% certain as I'm unable to check
>> right
>> > > now.
>> > >
>> > > PS: Take care about dialects. Dialects are used in unparsing as a
>> > > "configuration" of SqlWriter, and can unparse the syntax tree
>> differently
>> > > based on the dialect you have chosen.
>> > >
>> > > Kind regards,
>> > > Askar Bozcan
>> > >
>> > > On 2 Apr 2023, at 14:56, Soumyadeep Mukhopadhyay <
>> [email protected]>
>> > > wrote:
>> > >
>> > > Hello All,
>> > >
>> > > I have just heard of Apache Calcite and was exploring the
>> possibilities.
>> > I
>> > > wish to achieve the following, and wanted to check if my hunch is
>> > correct:
>> > > - Use a template to build SQL queries, like use jinja-sql or even
>> pebble
>> > > (interpret the SqlNode tree kind of structure from my template and
>> then
>> > add
>> > > the necessary fields like table name and group by fields from an
>> input)
>> > >
>> > > So what I am expecting is "SELECT ? FROM ? WHERE ?" would be inside a
>> > query
>> > > template (in Jinja-sql it may look like "SELECT {{select_fields |
>> > sqlsafe}}
>> > > FROM {{table_name | sqlsafe}} WHERE {{where_clause | sqlsafe}}" and
>> > values
>> > > like 'select_fields' would be substituted at run-time from a
>> processing
>> > > engine like https://pypi.org/project/Jinja2/, but for Java) and the
>> > output
>> > > would be "SELECT col1 FROM table1 WHERE col1 IS NOT NULL" with some
>> > dialect
>> > > (like Snowflake or Big-Query).
>> > >
>> > > Is this possible? Any recommendations or suggestions are welcome.
>> Even if
>> > > the approach feels wrong please let me know. :)
>> > > Thank you for your time and consideration.
>> > >
>> > > With regards,
>> > > Soumyadeep Mukhopadhyay.
>> > >
>> >
>>
>

Reply via email to