Re: Template SQL

Soumyadeep Mukhopadhyay Mon, 24 Apr 2023 12:51:29 -0700

Hello Askar,

Thank you so much for guiding me, I have taken some inspiration from your
code and from Dremio's implementation and compiled this -
https://gist.github.com/Soumyadeep-github/fdb963a4f1df2194eea3268903e3ac7b.


I have a few follow-up questions, if I may, regarding what I have
implemented:

- is it possible to implement this as a rule, probably a RelRule (I am
saying a RelRule because as far as I have understood RelRule is the way you
implement rules that you would like to enforce before the physical plan is
created)?
*goal : **Find GROUP BY in a query, check if HAVING COUNT(*)>120 is
present, if not then add it in the query (probably transform it).*
*research done so far* : I probably need to create a RelRule and implement
the "matches" and "onMatch" methods. The "onMatch" method should transform
my rel or rex node using transformTo, probably.
Something like
https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.html
is
something which I might need to implement.
But a few methods are hard to grasp for me and they are:

 - transformTo
 - RelRule.Config.withOperandFor()

Could you please give me some insight on how I can understand their
structures?

- is it possible to find whether a node (sql/rel) is GROUP BY or not?
- if there is an AND clause in WHERE, is there a way to parse all the
filter conditions separately? (using RexCall.getOperator did not work)
- is this (what I have done) only a hack or is there a better way to
implement this?

I understand that my questions are probably not trivial, so I appreciate
that you are reading my emails and providing your valuable feedback.

Thank you for your time and consideration. :)

With regards,
Soumyadeep Mukhopadhyay.



On Sun, Apr 16, 2023 at 11:34 PM Askar Bozcan <[email protected]> wrote:

> Thank you for your kind words :)
>
> - Askar
>
> On 15 Apr 2023 Sat at 21:39 Soumyadeep Mukhopadhyay <[email protected]
> >
> wrote:
>
> > Hello Askar,
> >
> > Thank you so much for taking the time to compile this for me.
> >
> > I shall be indebted to you for this effort. Please let me know if I can
> be
> > of any assistance ever to repay your kindness.
> >
> > I appreciate all your help and guidance. Thank you once again. I shall go
> > through the same and will get back to you as soon as possible. Thank you
> > again sir! :)
> >
> > With best regards,
> > Soumyadeep Mukhopadhyay.
> >
> >
> > On Sat, 15 Apr 2023 at 11:52 PM, Askar Bozcan <[email protected]>
> > wrote:
> >
> > > Hello again!
> > > I have prepared this gist to perhaps help you understand how to
> traverse
> > > through a Sql parse tree:
> > > https://gist.github.com/askarbozcan/6ffc01b465550e171a95074308cab40f
> > >
> > > I have implemented a Filterer class that returns false on any queries
> > which
> > > have 'HAVING COUNT > x' where x is <= 100, and true for all other
> > queries.
> > > I have used Kotlin, so if you have difficulties understanding it (as
> > you're
> > > using Scala), check out Kotlin's smart cast and nullability.
> > >
> > > Good luck and hopefully that was helpful,
> > > Askar Bozcan
> > >
> > > On Wed, 12 Apr 2023 at 23:37, Soumyadeep Mukhopadhyay <
> > > [email protected]>
> > > wrote:
> > >
> > > > Hi Askar,
> > > >
> > > > I have gone through your email a couple of times. I am trying to
> > > > understand it bit by bit.
> > > >
> > > > I have a request, please feel free to say no, is there any code base
> > > where
> > > > I can refer and understand how to implement my own filter rule?
> > > > What I realised I asked wrong is I wanted to implement "(HAVING
> > COUNT(*)
> > > > > 100)" as a global rule, if there are any aggregation queries.
> > > >
> > > > It is a bit difficult for me to understand where I should start.
> > > > For example, if I follow this
> > > https://github.com/zabetak/calcite-tutorial where
> > > > shall I start looking at?
> > > > In the below image I feel like some kind of comparison is going on,
> > > should
> > > > I devise my code like this? (in order to implement something like
> > > "(HAVING
> > > > COUNT(*) > 100)")
> > > > [image: Screenshot 2023-04-11 at 5.38.43 PM.png]
> > > >
> > > > Also is there any documentation I can go through regarding how I can
> > > > traverse through the AST?
> > > > Probably I am not looking in the right places but so far I could only
> > go
> > > > through Tab9 code examples (or the documentation provided by the
> > Calcite
> > > > website) and things did not seem to be clear.
> > > > I realise it is probably a lot to ask, so whatever you share will be
> a
> > > lot
> > > > of help for me.
> > > >
> > > > Thanks again for your time, patience and help!
> > > >
> > > > With regards,
> > > > Soumyadeep Mukhopadhyay.
> > > >
> > > > On Fri, Apr 7, 2023 at 8:33 PM Askar Bozcan <[email protected]>
> > > wrote:
> > > >
> > > >> Hey,
> > > >> You can use Planner.reset(). Note that it has mutable state inside,
> so
> > > do
> > > >> not reuse the same planner instance in any multi-threading
> > environment.
> > > >>
> > > >> (I am assuming you have access to table metadata so that you will be
> > > able
> > > >> to convert your SqlNode tree into RelNode tree, relational
> expression
> > > >> tree)
> > > >> - Only return results above a certain threshold when using GROUP BY,
> > for
> > > >> example (HAVING COUNT(col1) > 100).
> > > >>
> > > >> I'm not quite sure I understand this question is HAVING part of the
> > > query?
> > > >>
> > > >> - Restrict the column on which joins can happen, or else throw an
> > error
> > > >> (almost like analysisException in Spark)
> > > >>
> > > >> Do you have access to table/schema metadata?
> > > >> If you do:
> > > >> 1) Convert your parsed syntax tree (SqlNode) into a logical
> relational
> > > >> tree
> > > >> (RelNode).
> > > >> Watch this tutorial by Stamatis:
> > > >> https://www.youtube.com/watch?v=p1O3E33FIs8.
> > > >> It will explain way better the usage than I can in an email. (And it
> > > what
> > > >> introduced me to Calcite's basics :))
> > > >> 2) Traverse your relational tree by implementing RelShuttle
> > > >> <
> > > >>
> > >
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/RelShuttle.html
> > > >> >.
> > > >> (Look up Visitor pattern to better understand how this recursive
> > > traversal
> > > >> works)
> > > >> 3) Ignore any RelNode's which are not LogicalJoin.
> > > >> 4) When you encounter LogicalJoin, traverse its children with
> getLeft
> > > and
> > > >> getRight
> > > >> 5) When you encounter LogicalTableScan,  You can get its RelOptTable
> > > with
> > > >> getTable, and then RelOptTable.getRowType() to find the fields.
> > > >> (Not 100% about step 5, but should be close to it)
> > > >>
> > > >>
> > > >> - Restrict the columns that can be passed in a select statement or
> > else
> > > >> throw an error (like in the point above)
> > > >> Same logic as above. But instead of TableScan, look at the root
> > > >> LogicalProject.
> > > >>
> > > >> All of the above should be doable with just the parse tree (SqlNode)
> > and
> > > >> without access to table metadata using SqlShuttle; however, it's
> > easier
> > > >> and
> > > >> less error-prone with relational tree IMO.
> > > >>
> > > >> - Askar
> > > >>
> > > >>
> > > >> On Thu, 6 Apr 2023 at 23:31, Soumyadeep Mukhopadhyay <
> > > >> [email protected]>
> > > >> wrote:
> > > >>
> > > >> > Hey Askar,
> > > >> >
> > > >> > It worked exactly how you suggested.
> > > >> >
> > > >> > ```
> > > >> >
> > > >> > private val sqlQuery: String = "SELECT list_of_columns FROM
> > table_name
> > > >> > WHERE predicate_column = 'predicate_value'"
> > > >> > private val frameworkConfig: FrameworkConfig =
> > > >> > Frameworks.newConfigBuilder.build()
> > > >> > private val planner: Planner =
> > Frameworks.getPlanner(frameworkConfig)
> > > >> > private val planner2: Planner =
> > Frameworks.getPlanner(frameworkConfig)
> > > >> > private val planner3: Planner =
> > Frameworks.getPlanner(frameworkConfig)
> > > >> > private val sqlNode: SqlNode = planner.parse(sqlQuery)
> > > >> > println(sqlNode.getKind)
> > > >> > private val sqlSelectStmt: SqlSelect =
> > sqlNode.asInstanceOf[SqlSelect]
> > > >> >
> > > >> > private val setSelectColumnsQuery = "SELECT age"
> > > >> > private val selectList =
> > > >> >
> > > >>
> > >
> >
> planner2.parse(setSelectColumnsQuery).asInstanceOf[SqlSelect].getSelectList
> > > >> > private val setFromTableQuery = "SELECT employee"
> > > >> > private val fromTable =
> > > >> >
> > > planner3.parse(setFromTableQuery).asInstanceOf[SqlSelect].getSelectList
> > > >> >
> > > >> > sqlSelectStmt.setSelectList(selectList)
> > > >> > sqlSelectStmt.setFrom(fromTable)
> > > >> > private val finalQuery = sqlSelectStmt.asInstanceOf[SqlNode]
> > > >> > println(finalQuery.toSqlString(SnowflakeSqlDialect.DEFAULT))
> > > >> >
> > > >> > ```
> > > >> >
> > > >> > The only caveat I see is the need for a new planner for every new
> > > query.
> > > >> > Should I do something else or is this expected?
> > > >> >
> > > >> > On a different note, I wanted to ask about how I can write my own
> > > >> rules. Is
> > > >> > it possible to enforce following rules:
> > > >> > - Only return results above a certain threshold when using GROUP
> BY,
> > > for
> > > >> > example (HAVING COUNT(col1) > 100).
> > > >> > - Restrict the column on which joins can happen, or else throw an
> > > error
> > > >> > (almost like analysisException in Spark)
> > > >> > - Restrict the columns that can be passed in a select statement or
> > > else
> > > >> > throw an error (like in the point above)
> > > >> >
> > > >> > Not sure if it is feasible, please feel free to suggest. :) Thanks
> > > again
> > > >> > for your time!
> > > >> >
> > > >> > With regards,
> > > >> > Soumyadeep Mukhopadhyay.
> > > >> >
> > > >> >
> > > >> > On Sun, 2 Apr 2023 at 8:59 PM, Oscar Mulin <[email protected]
> >
> > > >> wrote:
> > > >> >
> > > >> > > Hey Soumyadeep,
> > > >> > > I think that can work with a few caveats.
> > > >> > > 0) Use the Planner from Frameworks
> > > >> > > <
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/Frameworks.html
> > > >> > > >
> > > >> > > 1) Parse the “template query" into a syntax tree (a root
> SqlNode)
> > > but
> > > >> do
> > > >> > > not validate it
> > > >> > > 2) Cast the root SqlNode into a SqlSelect
> > > >> > > <
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSelect.html
> > > >> > > >
> > > >> > > (can
> > > >> > > verify that it's a SELECT by using getKind())
> > > >> > > 3) Use the setters from SqlSelect to modify the children of root
> > > >> > SqlSelect
> > > >> > > node.
> > > >> > > 4) Unparse the root query back into a string
> (SqlNode.unparse()).
> > > >> > >
> > > >> > > Now the tricky part is 3), as you have to set proper SqlNode
> types
> > > as
> > > >> > > children of the SELECT node for column names, table names and
> for
> > > >> WHERE
> > > >> > > predicate. I don't remember them properly, but what you can do
> is
> > > >> parse a
> > > >> > > proper query (again, don't validate it as you'll then need table
> > > >> > metadata),
> > > >> > > and check the kinds (getKind()) of children SqlNode's and
> replace
> > > them
> > > >> > with
> > > >> > > your replacements using setters of root SqlSelect node.
> > > >> > >
> > > >> > > This should work, but I'm not 100% certain as I'm unable to
> check
> > > >> right
> > > >> > > now.
> > > >> > >
> > > >> > > PS: Take care about dialects. Dialects are used in unparsing as
> a
> > > >> > > "configuration" of SqlWriter, and can unparse the syntax tree
> > > >> differently
> > > >> > > based on the dialect you have chosen.
> > > >> > >
> > > >> > > Kind regards,
> > > >> > > Askar Bozcan
> > > >> > >
> > > >> > > On 2 Apr 2023, at 14:56, Soumyadeep Mukhopadhyay <
> > > >> [email protected]>
> > > >> > > wrote:
> > > >> > >
> > > >> > > Hello All,
> > > >> > >
> > > >> > > I have just heard of Apache Calcite and was exploring the
> > > >> possibilities.
> > > >> > I
> > > >> > > wish to achieve the following, and wanted to check if my hunch
> is
> > > >> > correct:
> > > >> > > - Use a template to build SQL queries, like use jinja-sql or
> even
> > > >> pebble
> > > >> > > (interpret the SqlNode tree kind of structure from my template
> and
> > > >> then
> > > >> > add
> > > >> > > the necessary fields like table name and group by fields from an
> > > >> input)
> > > >> > >
> > > >> > > So what I am expecting is "SELECT ? FROM ? WHERE ?" would be
> > inside
> > > a
> > > >> > query
> > > >> > > template (in Jinja-sql it may look like "SELECT {{select_fields
> |
> > > >> > sqlsafe}}
> > > >> > > FROM {{table_name | sqlsafe}} WHERE {{where_clause | sqlsafe}}"
> > and
> > > >> > values
> > > >> > > like 'select_fields' would be substituted at run-time from a
> > > >> processing
> > > >> > > engine like https://pypi.org/project/Jinja2/, but for Java) and
> > the
> > > >> > output
> > > >> > > would be "SELECT col1 FROM table1 WHERE col1 IS NOT NULL" with
> > some
> > > >> > dialect
> > > >> > > (like Snowflake or Big-Query).
> > > >> > >
> > > >> > > Is this possible? Any recommendations or suggestions are
> welcome.
> > > >> Even if
> > > >> > > the approach feels wrong please let me know. :)
> > > >> > > Thank you for your time and consideration.
> > > >> > >
> > > >> > > With regards,
> > > >> > > Soumyadeep Mukhopadhyay.
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: Template SQL

Reply via email to