Re: Template SQL

Soumyadeep Mukhopadhyay Sat, 15 Apr 2023 11:39:21 -0700

Hello Askar,

Thank you so much for taking the time to compile this for me.


I shall be indebted to you for this effort. Please let me know if I can be
of any assistance ever to repay your kindness.

I appreciate all your help and guidance. Thank you once again. I shall go
through the same and will get back to you as soon as possible. Thank you
again sir! :)

With best regards,
Soumyadeep Mukhopadhyay.


On Sat, 15 Apr 2023 at 11:52 PM, Askar Bozcan <[email protected]> wrote:

> Hello again!
> I have prepared this gist to perhaps help you understand how to traverse
> through a Sql parse tree:
> https://gist.github.com/askarbozcan/6ffc01b465550e171a95074308cab40f
>
> I have implemented a Filterer class that returns false on any queries which
> have 'HAVING COUNT > x' where x is <= 100, and true for all other queries.
> I have used Kotlin, so if you have difficulties understanding it (as you're
> using Scala), check out Kotlin's smart cast and nullability.
>
> Good luck and hopefully that was helpful,
> Askar Bozcan
>
> On Wed, 12 Apr 2023 at 23:37, Soumyadeep Mukhopadhyay <
> [email protected]>
> wrote:
>
> > Hi Askar,
> >
> > I have gone through your email a couple of times. I am trying to
> > understand it bit by bit.
> >
> > I have a request, please feel free to say no, is there any code base
> where
> > I can refer and understand how to implement my own filter rule?
> > What I realised I asked wrong is I wanted to implement "(HAVING COUNT(*)
> > > 100)" as a global rule, if there are any aggregation queries.
> >
> > It is a bit difficult for me to understand where I should start.
> > For example, if I follow this
> https://github.com/zabetak/calcite-tutorial where
> > shall I start looking at?
> > In the below image I feel like some kind of comparison is going on,
> should
> > I devise my code like this? (in order to implement something like
> "(HAVING
> > COUNT(*) > 100)")
> > [image: Screenshot 2023-04-11 at 5.38.43 PM.png]
> >
> > Also is there any documentation I can go through regarding how I can
> > traverse through the AST?
> > Probably I am not looking in the right places but so far I could only go
> > through Tab9 code examples (or the documentation provided by the Calcite
> > website) and things did not seem to be clear.
> > I realise it is probably a lot to ask, so whatever you share will be a
> lot
> > of help for me.
> >
> > Thanks again for your time, patience and help!
> >
> > With regards,
> > Soumyadeep Mukhopadhyay.
> >
> > On Fri, Apr 7, 2023 at 8:33 PM Askar Bozcan <[email protected]>
> wrote:
> >
> >> Hey,
> >> You can use Planner.reset(). Note that it has mutable state inside, so
> do
> >> not reuse the same planner instance in any multi-threading environment.
> >>
> >> (I am assuming you have access to table metadata so that you will be
> able
> >> to convert your SqlNode tree into RelNode tree, relational expression
> >> tree)
> >> - Only return results above a certain threshold when using GROUP BY, for
> >> example (HAVING COUNT(col1) > 100).
> >>
> >> I'm not quite sure I understand this question is HAVING part of the
> query?
> >>
> >> - Restrict the column on which joins can happen, or else throw an error
> >> (almost like analysisException in Spark)
> >>
> >> Do you have access to table/schema metadata?
> >> If you do:
> >> 1) Convert your parsed syntax tree (SqlNode) into a logical relational
> >> tree
> >> (RelNode).
> >> Watch this tutorial by Stamatis:
> >> https://www.youtube.com/watch?v=p1O3E33FIs8.
> >> It will explain way better the usage than I can in an email. (And it
> what
> >> introduced me to Calcite's basics :))
> >> 2) Traverse your relational tree by implementing RelShuttle
> >> <
> >>
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/RelShuttle.html
> >> >.
> >> (Look up Visitor pattern to better understand how this recursive
> traversal
> >> works)
> >> 3) Ignore any RelNode's which are not LogicalJoin.
> >> 4) When you encounter LogicalJoin, traverse its children with getLeft
> and
> >> getRight
> >> 5) When you encounter LogicalTableScan,  You can get its RelOptTable
> with
> >> getTable, and then RelOptTable.getRowType() to find the fields.
> >> (Not 100% about step 5, but should be close to it)
> >>
> >>
> >> - Restrict the columns that can be passed in a select statement or else
> >> throw an error (like in the point above)
> >> Same logic as above. But instead of TableScan, look at the root
> >> LogicalProject.
> >>
> >> All of the above should be doable with just the parse tree (SqlNode) and
> >> without access to table metadata using SqlShuttle; however, it's easier
> >> and
> >> less error-prone with relational tree IMO.
> >>
> >> - Askar
> >>
> >>
> >> On Thu, 6 Apr 2023 at 23:31, Soumyadeep Mukhopadhyay <
> >> [email protected]>
> >> wrote:
> >>
> >> > Hey Askar,
> >> >
> >> > It worked exactly how you suggested.
> >> >
> >> > ```
> >> >
> >> > private val sqlQuery: String = "SELECT list_of_columns FROM table_name
> >> > WHERE predicate_column = 'predicate_value'"
> >> > private val frameworkConfig: FrameworkConfig =
> >> > Frameworks.newConfigBuilder.build()
> >> > private val planner: Planner = Frameworks.getPlanner(frameworkConfig)
> >> > private val planner2: Planner = Frameworks.getPlanner(frameworkConfig)
> >> > private val planner3: Planner = Frameworks.getPlanner(frameworkConfig)
> >> > private val sqlNode: SqlNode = planner.parse(sqlQuery)
> >> > println(sqlNode.getKind)
> >> > private val sqlSelectStmt: SqlSelect = sqlNode.asInstanceOf[SqlSelect]
> >> >
> >> > private val setSelectColumnsQuery = "SELECT age"
> >> > private val selectList =
> >> >
> >>
> planner2.parse(setSelectColumnsQuery).asInstanceOf[SqlSelect].getSelectList
> >> > private val setFromTableQuery = "SELECT employee"
> >> > private val fromTable =
> >> >
> planner3.parse(setFromTableQuery).asInstanceOf[SqlSelect].getSelectList
> >> >
> >> > sqlSelectStmt.setSelectList(selectList)
> >> > sqlSelectStmt.setFrom(fromTable)
> >> > private val finalQuery = sqlSelectStmt.asInstanceOf[SqlNode]
> >> > println(finalQuery.toSqlString(SnowflakeSqlDialect.DEFAULT))
> >> >
> >> > ```
> >> >
> >> > The only caveat I see is the need for a new planner for every new
> query.
> >> > Should I do something else or is this expected?
> >> >
> >> > On a different note, I wanted to ask about how I can write my own
> >> rules. Is
> >> > it possible to enforce following rules:
> >> > - Only return results above a certain threshold when using GROUP BY,
> for
> >> > example (HAVING COUNT(col1) > 100).
> >> > - Restrict the column on which joins can happen, or else throw an
> error
> >> > (almost like analysisException in Spark)
> >> > - Restrict the columns that can be passed in a select statement or
> else
> >> > throw an error (like in the point above)
> >> >
> >> > Not sure if it is feasible, please feel free to suggest. :) Thanks
> again
> >> > for your time!
> >> >
> >> > With regards,
> >> > Soumyadeep Mukhopadhyay.
> >> >
> >> >
> >> > On Sun, 2 Apr 2023 at 8:59 PM, Oscar Mulin <[email protected]>
> >> wrote:
> >> >
> >> > > Hey Soumyadeep,
> >> > > I think that can work with a few caveats.
> >> > > 0) Use the Planner from Frameworks
> >> > > <
> >> > >
> >> >
> >>
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/Frameworks.html
> >> > > >
> >> > > 1) Parse the “template query" into a syntax tree (a root SqlNode)
> but
> >> do
> >> > > not validate it
> >> > > 2) Cast the root SqlNode into a SqlSelect
> >> > > <
> >> > >
> >> >
> >>
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSelect.html
> >> > > >
> >> > > (can
> >> > > verify that it's a SELECT by using getKind())
> >> > > 3) Use the setters from SqlSelect to modify the children of root
> >> > SqlSelect
> >> > > node.
> >> > > 4) Unparse the root query back into a string (SqlNode.unparse()).
> >> > >
> >> > > Now the tricky part is 3), as you have to set proper SqlNode types
> as
> >> > > children of the SELECT node for column names, table names and for
> >> WHERE
> >> > > predicate. I don't remember them properly, but what you can do is
> >> parse a
> >> > > proper query (again, don't validate it as you'll then need table
> >> > metadata),
> >> > > and check the kinds (getKind()) of children SqlNode's and replace
> them
> >> > with
> >> > > your replacements using setters of root SqlSelect node.
> >> > >
> >> > > This should work, but I'm not 100% certain as I'm unable to check
> >> right
> >> > > now.
> >> > >
> >> > > PS: Take care about dialects. Dialects are used in unparsing as a
> >> > > "configuration" of SqlWriter, and can unparse the syntax tree
> >> differently
> >> > > based on the dialect you have chosen.
> >> > >
> >> > > Kind regards,
> >> > > Askar Bozcan
> >> > >
> >> > > On 2 Apr 2023, at 14:56, Soumyadeep Mukhopadhyay <
> >> [email protected]>
> >> > > wrote:
> >> > >
> >> > > Hello All,
> >> > >
> >> > > I have just heard of Apache Calcite and was exploring the
> >> possibilities.
> >> > I
> >> > > wish to achieve the following, and wanted to check if my hunch is
> >> > correct:
> >> > > - Use a template to build SQL queries, like use jinja-sql or even
> >> pebble
> >> > > (interpret the SqlNode tree kind of structure from my template and
> >> then
> >> > add
> >> > > the necessary fields like table name and group by fields from an
> >> input)
> >> > >
> >> > > So what I am expecting is "SELECT ? FROM ? WHERE ?" would be inside
> a
> >> > query
> >> > > template (in Jinja-sql it may look like "SELECT {{select_fields |
> >> > sqlsafe}}
> >> > > FROM {{table_name | sqlsafe}} WHERE {{where_clause | sqlsafe}}" and
> >> > values
> >> > > like 'select_fields' would be substituted at run-time from a
> >> processing
> >> > > engine like https://pypi.org/project/Jinja2/, but for Java) and the
> >> > output
> >> > > would be "SELECT col1 FROM table1 WHERE col1 IS NOT NULL" with some
> >> > dialect
> >> > > (like Snowflake or Big-Query).
> >> > >
> >> > > Is this possible? Any recommendations or suggestions are welcome.
> >> Even if
> >> > > the approach feels wrong please let me know. :)
> >> > > Thank you for your time and consideration.
> >> > >
> >> > > With regards,
> >> > > Soumyadeep Mukhopadhyay.
> >> > >
> >> >
> >>
> >
>

Re: Template SQL

Reply via email to