Re: Template SQL

Soumyadeep Mukhopadhyay Wed, 12 Apr 2023 13:37:48 -0700

Hi Askar,

I have gone through your email a couple of times. I am trying to understand
it bit by bit.


I have a request, please feel free to say no, is there any code base where
I can refer and understand how to implement my own filter rule?
What I realised I asked wrong is I wanted to implement "(HAVING COUNT(*) >
100)" as a global rule, if there are any aggregation queries.

It is a bit difficult for me to understand where I should start.
For example, if I follow this https://github.com/zabetak/calcite-tutorial where
shall I start looking at?
In the below image I feel like some kind of comparison is going on, should
I devise my code like this? (in order to implement something like "(HAVING
COUNT(*) > 100)")
[image: Screenshot 2023-04-11 at 5.38.43 PM.png]

Also is there any documentation I can go through regarding how I can
traverse through the AST?
Probably I am not looking in the right places but so far I could only go
through Tab9 code examples (or the documentation provided by the Calcite
website) and things did not seem to be clear.
I realise it is probably a lot to ask, so whatever you share will be a lot
of help for me.

Thanks again for your time, patience and help!

With regards,
Soumyadeep Mukhopadhyay.

On Fri, Apr 7, 2023 at 8:33 PM Askar Bozcan <[email protected]> wrote:

> Hey,
> You can use Planner.reset(). Note that it has mutable state inside, so do
> not reuse the same planner instance in any multi-threading environment.
>
> (I am assuming you have access to table metadata so that you will be able
> to convert your SqlNode tree into RelNode tree, relational expression tree)
> - Only return results above a certain threshold when using GROUP BY, for
> example (HAVING COUNT(col1) > 100).
>
> I'm not quite sure I understand this question is HAVING part of the query?
>
> - Restrict the column on which joins can happen, or else throw an error
> (almost like analysisException in Spark)
>
> Do you have access to table/schema metadata?
> If you do:
> 1) Convert your parsed syntax tree (SqlNode) into a logical relational tree
> (RelNode).
> Watch this tutorial by Stamatis:
> https://www.youtube.com/watch?v=p1O3E33FIs8.
> It will explain way better the usage than I can in an email. (And it what
> introduced me to Calcite's basics :))
> 2) Traverse your relational tree by implementing RelShuttle
> <
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/RelShuttle.html
> >.
> (Look up Visitor pattern to better understand how this recursive traversal
> works)
> 3) Ignore any RelNode's which are not LogicalJoin.
> 4) When you encounter LogicalJoin, traverse its children with getLeft and
> getRight
> 5) When you encounter LogicalTableScan,  You can get its RelOptTable with
> getTable, and then RelOptTable.getRowType() to find the fields.
> (Not 100% about step 5, but should be close to it)
>
>
> - Restrict the columns that can be passed in a select statement or else
> throw an error (like in the point above)
> Same logic as above. But instead of TableScan, look at the root
> LogicalProject.
>
> All of the above should be doable with just the parse tree (SqlNode) and
> without access to table metadata using SqlShuttle; however, it's easier and
> less error-prone with relational tree IMO.
>
> - Askar
>
>
> On Thu, 6 Apr 2023 at 23:31, Soumyadeep Mukhopadhyay <
> [email protected]>
> wrote:
>
> > Hey Askar,
> >
> > It worked exactly how you suggested.
> >
> > ```
> >
> > private val sqlQuery: String = "SELECT list_of_columns FROM table_name
> > WHERE predicate_column = 'predicate_value'"
> > private val frameworkConfig: FrameworkConfig =
> > Frameworks.newConfigBuilder.build()
> > private val planner: Planner = Frameworks.getPlanner(frameworkConfig)
> > private val planner2: Planner = Frameworks.getPlanner(frameworkConfig)
> > private val planner3: Planner = Frameworks.getPlanner(frameworkConfig)
> > private val sqlNode: SqlNode = planner.parse(sqlQuery)
> > println(sqlNode.getKind)
> > private val sqlSelectStmt: SqlSelect = sqlNode.asInstanceOf[SqlSelect]
> >
> > private val setSelectColumnsQuery = "SELECT age"
> > private val selectList =
> >
> planner2.parse(setSelectColumnsQuery).asInstanceOf[SqlSelect].getSelectList
> > private val setFromTableQuery = "SELECT employee"
> > private val fromTable =
> > planner3.parse(setFromTableQuery).asInstanceOf[SqlSelect].getSelectList
> >
> > sqlSelectStmt.setSelectList(selectList)
> > sqlSelectStmt.setFrom(fromTable)
> > private val finalQuery = sqlSelectStmt.asInstanceOf[SqlNode]
> > println(finalQuery.toSqlString(SnowflakeSqlDialect.DEFAULT))
> >
> > ```
> >
> > The only caveat I see is the need for a new planner for every new query.
> > Should I do something else or is this expected?
> >
> > On a different note, I wanted to ask about how I can write my own rules.
> Is
> > it possible to enforce following rules:
> > - Only return results above a certain threshold when using GROUP BY, for
> > example (HAVING COUNT(col1) > 100).
> > - Restrict the column on which joins can happen, or else throw an error
> > (almost like analysisException in Spark)
> > - Restrict the columns that can be passed in a select statement or else
> > throw an error (like in the point above)
> >
> > Not sure if it is feasible, please feel free to suggest. :) Thanks again
> > for your time!
> >
> > With regards,
> > Soumyadeep Mukhopadhyay.
> >
> >
> > On Sun, 2 Apr 2023 at 8:59 PM, Oscar Mulin <[email protected]>
> wrote:
> >
> > > Hey Soumyadeep,
> > > I think that can work with a few caveats.
> > > 0) Use the Planner from Frameworks
> > > <
> > >
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/Frameworks.html
> > > >
> > > 1) Parse the “template query" into a syntax tree (a root SqlNode) but
> do
> > > not validate it
> > > 2) Cast the root SqlNode into a SqlSelect
> > > <
> > >
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSelect.html
> > > >
> > > (can
> > > verify that it's a SELECT by using getKind())
> > > 3) Use the setters from SqlSelect to modify the children of root
> > SqlSelect
> > > node.
> > > 4) Unparse the root query back into a string (SqlNode.unparse()).
> > >
> > > Now the tricky part is 3), as you have to set proper SqlNode types as
> > > children of the SELECT node for column names, table names and for WHERE
> > > predicate. I don't remember them properly, but what you can do is
> parse a
> > > proper query (again, don't validate it as you'll then need table
> > metadata),
> > > and check the kinds (getKind()) of children SqlNode's and replace them
> > with
> > > your replacements using setters of root SqlSelect node.
> > >
> > > This should work, but I'm not 100% certain as I'm unable to check right
> > > now.
> > >
> > > PS: Take care about dialects. Dialects are used in unparsing as a
> > > "configuration" of SqlWriter, and can unparse the syntax tree
> differently
> > > based on the dialect you have chosen.
> > >
> > > Kind regards,
> > > Askar Bozcan
> > >
> > > On 2 Apr 2023, at 14:56, Soumyadeep Mukhopadhyay <
> [email protected]>
> > > wrote:
> > >
> > > Hello All,
> > >
> > > I have just heard of Apache Calcite and was exploring the
> possibilities.
> > I
> > > wish to achieve the following, and wanted to check if my hunch is
> > correct:
> > > - Use a template to build SQL queries, like use jinja-sql or even
> pebble
> > > (interpret the SqlNode tree kind of structure from my template and then
> > add
> > > the necessary fields like table name and group by fields from an input)
> > >
> > > So what I am expecting is "SELECT ? FROM ? WHERE ?" would be inside a
> > query
> > > template (in Jinja-sql it may look like "SELECT {{select_fields |
> > sqlsafe}}
> > > FROM {{table_name | sqlsafe}} WHERE {{where_clause | sqlsafe}}" and
> > values
> > > like 'select_fields' would be substituted at run-time from a processing
> > > engine like https://pypi.org/project/Jinja2/, but for Java) and the
> > output
> > > would be "SELECT col1 FROM table1 WHERE col1 IS NOT NULL" with some
> > dialect
> > > (like Snowflake or Big-Query).
> > >
> > > Is this possible? Any recommendations or suggestions are welcome. Even
> if
> > > the approach feels wrong please let me know. :)
> > > Thank you for your time and consideration.
> > >
> > > With regards,
> > > Soumyadeep Mukhopadhyay.
> > >
> >
>

Re: Template SQL

Reply via email to