Hey Askar,

I am happy to inform that I could make a little more progress, and here is
the gist -
https://gist.github.com/Soumyadeep-github/47c1131bf02149af995d8e3bccb3ee67.

Although I am able to do this, I don't think I have reached my goal yet.
A question that came to mind was this :
- Is it not possible to add a node to a tree when the sql query has been
parsed to RelNode after validation?
- If I wish to apply a similar approach to a RelRule which I am trying to
enforce, is that not expected? (an anti-pattern)
- Is there a way to generate the RelBuilder version of the same Sql query?
(SqlToRelConverter?)

My approach in this gist :
get a SqlNode (parsed query) ==> visit each SqlCall node recursively ==>
find SqlSelect (since there's getHaving and setHaving) ==> create a new
dummy query and extract having from the query ==>
set Having for the current query at hand via the dummy query.
Note : For no particular reason I have used only visit by SqlCall, was
trying to build up on what you shared and the other visit methods didn't
seem to fit.

The approach I have in mind for a RelRule :
use ConverterRule ==> find the node under onMatch as Aggregate ==>
then use call.transformTo to add a having clause to the existing aggregate
node if there's no having clause present using the RelBuilder, but that
does not seem feasible.
(below is what I was thinking, original query : SELECT o_custkey, COUNT(*) AS
C FROM orders GROUP BY o_custkey)
[image: Screenshot 2023-05-10 at 2.09.41 AM.png]

Maybe I am missing something. Please feel free to share your comments. :)
Thank you again!

With regards,
Soumyadeep Mukhopadhyay.

On Thu, May 4, 2023 at 7:43 AM Soumyadeep Mukhopadhyay <
[email protected]> wrote:

> Hey Askar,
>
> Thank you so much for your email. Please take as much time as you need. I
> will keep trying in the meantime. :)
>
> With regards,
> Soumyadeep Mukhopadhyay.
>
> On Thu, 4 May 2023 at 3:04 AM, Askar Bozcan <[email protected]> wrote:
>
>> Hey, sorry for the late reply but I had some stuff to do IRL & some
>> deadlines I need to meet right now. I'll try to answer you as soon as I
>> can, but this week doesn't seem possible.
>>
>> Keep the questions coming, however, if you haven't solved them yourself
>> (and if you did, please share your solutions). I'm planning to write a
>> sort-of "Comprehensive Calcite 101" documentation in the near future which
>> I am going to be basing on my own experiences & questions in the
>> community.
>>
>> Kind regards,
>> Askar
>>
>> On Mon, 24 Apr 2023 at 22:52, Soumyadeep Mukhopadhyay <
>> [email protected]>
>> wrote:
>>
>> > Hello Askar,
>> >
>> > Thank you so much for guiding me, I have taken some inspiration from
>> your
>> > code and from Dremio's implementation and compiled this -
>> >
>> https://gist.github.com/Soumyadeep-github/fdb963a4f1df2194eea3268903e3ac7b
>> > .
>> >
>> > I have a few follow-up questions, if I may, regarding what I have
>> > implemented:
>> >
>> > - is it possible to implement this as a rule, probably a RelRule (I am
>> > saying a RelRule because as far as I have understood RelRule is the way
>> you
>> > implement rules that you would like to enforce before the physical plan
>> is
>> > created)?
>> > *goal : **Find GROUP BY in a query, check if HAVING COUNT(*)>120 is
>> > present, if not then add it in the query (probably transform it).*
>> > *research done so far* : I probably need to create a RelRule and
>> implement
>> > the "matches" and "onMatch" methods. The "onMatch" method should
>> transform
>> > my rel or rex node using transformTo, probably.
>> > Something like
>> >
>> >
>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.html
>> > is
>> > something which I might need to implement.
>> > But a few methods are hard to grasp for me and they are:
>> >
>> >  - transformTo
>> >  - RelRule.Config.withOperandFor()
>> >
>> > Could you please give me some insight on how I can understand their
>> > structures?
>> >
>> > - is it possible to find whether a node (sql/rel) is GROUP BY or not?
>> > - if there is an AND clause in WHERE, is there a way to parse all the
>> > filter conditions separately? (using RexCall.getOperator did not work)
>> > - is this (what I have done) only a hack or is there a better way to
>> > implement this?
>> >
>> > I understand that my questions are probably not trivial, so I appreciate
>> > that you are reading my emails and providing your valuable feedback.
>> >
>> > Thank you for your time and consideration. :)
>> >
>> > With regards,
>> > Soumyadeep Mukhopadhyay.
>> >
>> >
>> >
>> > On Sun, Apr 16, 2023 at 11:34 PM Askar Bozcan <[email protected]>
>> > wrote:
>> >
>> > > Thank you for your kind words :)
>> > >
>> > > - Askar
>> > >
>> > > On 15 Apr 2023 Sat at 21:39 Soumyadeep Mukhopadhyay <
>> > [email protected]
>> > > >
>> > > wrote:
>> > >
>> > > > Hello Askar,
>> > > >
>> > > > Thank you so much for taking the time to compile this for me.
>> > > >
>> > > > I shall be indebted to you for this effort. Please let me know if I
>> can
>> > > be
>> > > > of any assistance ever to repay your kindness.
>> > > >
>> > > > I appreciate all your help and guidance. Thank you once again. I
>> shall
>> > go
>> > > > through the same and will get back to you as soon as possible. Thank
>> > you
>> > > > again sir! :)
>> > > >
>> > > > With best regards,
>> > > > Soumyadeep Mukhopadhyay.
>> > > >
>> > > >
>> > > > On Sat, 15 Apr 2023 at 11:52 PM, Askar Bozcan <
>> [email protected]>
>> > > > wrote:
>> > > >
>> > > > > Hello again!
>> > > > > I have prepared this gist to perhaps help you understand how to
>> > > traverse
>> > > > > through a Sql parse tree:
>> > > > >
>> https://gist.github.com/askarbozcan/6ffc01b465550e171a95074308cab40f
>> > > > >
>> > > > > I have implemented a Filterer class that returns false on any
>> queries
>> > > > which
>> > > > > have 'HAVING COUNT > x' where x is <= 100, and true for all other
>> > > > queries.
>> > > > > I have used Kotlin, so if you have difficulties understanding it
>> (as
>> > > > you're
>> > > > > using Scala), check out Kotlin's smart cast and nullability.
>> > > > >
>> > > > > Good luck and hopefully that was helpful,
>> > > > > Askar Bozcan
>> > > > >
>> > > > > On Wed, 12 Apr 2023 at 23:37, Soumyadeep Mukhopadhyay <
>> > > > > [email protected]>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi Askar,
>> > > > > >
>> > > > > > I have gone through your email a couple of times. I am trying to
>> > > > > > understand it bit by bit.
>> > > > > >
>> > > > > > I have a request, please feel free to say no, is there any code
>> > base
>> > > > > where
>> > > > > > I can refer and understand how to implement my own filter rule?
>> > > > > > What I realised I asked wrong is I wanted to implement "(HAVING
>> > > > COUNT(*)
>> > > > > > > 100)" as a global rule, if there are any aggregation queries.
>> > > > > >
>> > > > > > It is a bit difficult for me to understand where I should start.
>> > > > > > For example, if I follow this
>> > > > > https://github.com/zabetak/calcite-tutorial where
>> > > > > > shall I start looking at?
>> > > > > > In the below image I feel like some kind of comparison is going
>> on,
>> > > > > should
>> > > > > > I devise my code like this? (in order to implement something
>> like
>> > > > > "(HAVING
>> > > > > > COUNT(*) > 100)")
>> > > > > > [image: Screenshot 2023-04-11 at 5.38.43 PM.png]
>> > > > > >
>> > > > > > Also is there any documentation I can go through regarding how I
>> > can
>> > > > > > traverse through the AST?
>> > > > > > Probably I am not looking in the right places but so far I could
>> > only
>> > > > go
>> > > > > > through Tab9 code examples (or the documentation provided by the
>> > > > Calcite
>> > > > > > website) and things did not seem to be clear.
>> > > > > > I realise it is probably a lot to ask, so whatever you share
>> will
>> > be
>> > > a
>> > > > > lot
>> > > > > > of help for me.
>> > > > > >
>> > > > > > Thanks again for your time, patience and help!
>> > > > > >
>> > > > > > With regards,
>> > > > > > Soumyadeep Mukhopadhyay.
>> > > > > >
>> > > > > > On Fri, Apr 7, 2023 at 8:33 PM Askar Bozcan <
>> [email protected]
>> > >
>> > > > > wrote:
>> > > > > >
>> > > > > >> Hey,
>> > > > > >> You can use Planner.reset(). Note that it has mutable state
>> > inside,
>> > > so
>> > > > > do
>> > > > > >> not reuse the same planner instance in any multi-threading
>> > > > environment.
>> > > > > >>
>> > > > > >> (I am assuming you have access to table metadata so that you
>> will
>> > be
>> > > > > able
>> > > > > >> to convert your SqlNode tree into RelNode tree, relational
>> > > expression
>> > > > > >> tree)
>> > > > > >> - Only return results above a certain threshold when using
>> GROUP
>> > BY,
>> > > > for
>> > > > > >> example (HAVING COUNT(col1) > 100).
>> > > > > >>
>> > > > > >> I'm not quite sure I understand this question is HAVING part of
>> > the
>> > > > > query?
>> > > > > >>
>> > > > > >> - Restrict the column on which joins can happen, or else throw
>> an
>> > > > error
>> > > > > >> (almost like analysisException in Spark)
>> > > > > >>
>> > > > > >> Do you have access to table/schema metadata?
>> > > > > >> If you do:
>> > > > > >> 1) Convert your parsed syntax tree (SqlNode) into a logical
>> > > relational
>> > > > > >> tree
>> > > > > >> (RelNode).
>> > > > > >> Watch this tutorial by Stamatis:
>> > > > > >> https://www.youtube.com/watch?v=p1O3E33FIs8.
>> > > > > >> It will explain way better the usage than I can in an email.
>> (And
>> > it
>> > > > > what
>> > > > > >> introduced me to Calcite's basics :))
>> > > > > >> 2) Traverse your relational tree by implementing RelShuttle
>> > > > > >> <
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/RelShuttle.html
>> > > > > >> >.
>> > > > > >> (Look up Visitor pattern to better understand how this
>> recursive
>> > > > > traversal
>> > > > > >> works)
>> > > > > >> 3) Ignore any RelNode's which are not LogicalJoin.
>> > > > > >> 4) When you encounter LogicalJoin, traverse its children with
>> > > getLeft
>> > > > > and
>> > > > > >> getRight
>> > > > > >> 5) When you encounter LogicalTableScan,  You can get its
>> > RelOptTable
>> > > > > with
>> > > > > >> getTable, and then RelOptTable.getRowType() to find the fields.
>> > > > > >> (Not 100% about step 5, but should be close to it)
>> > > > > >>
>> > > > > >>
>> > > > > >> - Restrict the columns that can be passed in a select
>> statement or
>> > > > else
>> > > > > >> throw an error (like in the point above)
>> > > > > >> Same logic as above. But instead of TableScan, look at the root
>> > > > > >> LogicalProject.
>> > > > > >>
>> > > > > >> All of the above should be doable with just the parse tree
>> > (SqlNode)
>> > > > and
>> > > > > >> without access to table metadata using SqlShuttle; however,
>> it's
>> > > > easier
>> > > > > >> and
>> > > > > >> less error-prone with relational tree IMO.
>> > > > > >>
>> > > > > >> - Askar
>> > > > > >>
>> > > > > >>
>> > > > > >> On Thu, 6 Apr 2023 at 23:31, Soumyadeep Mukhopadhyay <
>> > > > > >> [email protected]>
>> > > > > >> wrote:
>> > > > > >>
>> > > > > >> > Hey Askar,
>> > > > > >> >
>> > > > > >> > It worked exactly how you suggested.
>> > > > > >> >
>> > > > > >> > ```
>> > > > > >> >
>> > > > > >> > private val sqlQuery: String = "SELECT list_of_columns FROM
>> > > > table_name
>> > > > > >> > WHERE predicate_column = 'predicate_value'"
>> > > > > >> > private val frameworkConfig: FrameworkConfig =
>> > > > > >> > Frameworks.newConfigBuilder.build()
>> > > > > >> > private val planner: Planner =
>> > > > Frameworks.getPlanner(frameworkConfig)
>> > > > > >> > private val planner2: Planner =
>> > > > Frameworks.getPlanner(frameworkConfig)
>> > > > > >> > private val planner3: Planner =
>> > > > Frameworks.getPlanner(frameworkConfig)
>> > > > > >> > private val sqlNode: SqlNode = planner.parse(sqlQuery)
>> > > > > >> > println(sqlNode.getKind)
>> > > > > >> > private val sqlSelectStmt: SqlSelect =
>> > > > sqlNode.asInstanceOf[SqlSelect]
>> > > > > >> >
>> > > > > >> > private val setSelectColumnsQuery = "SELECT age"
>> > > > > >> > private val selectList =
>> > > > > >> >
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> planner2.parse(setSelectColumnsQuery).asInstanceOf[SqlSelect].getSelectList
>> > > > > >> > private val setFromTableQuery = "SELECT employee"
>> > > > > >> > private val fromTable =
>> > > > > >> >
>> > > > >
>> > planner3.parse(setFromTableQuery).asInstanceOf[SqlSelect].getSelectList
>> > > > > >> >
>> > > > > >> > sqlSelectStmt.setSelectList(selectList)
>> > > > > >> > sqlSelectStmt.setFrom(fromTable)
>> > > > > >> > private val finalQuery = sqlSelectStmt.asInstanceOf[SqlNode]
>> > > > > >> > println(finalQuery.toSqlString(SnowflakeSqlDialect.DEFAULT))
>> > > > > >> >
>> > > > > >> > ```
>> > > > > >> >
>> > > > > >> > The only caveat I see is the need for a new planner for every
>> > new
>> > > > > query.
>> > > > > >> > Should I do something else or is this expected?
>> > > > > >> >
>> > > > > >> > On a different note, I wanted to ask about how I can write my
>> > own
>> > > > > >> rules. Is
>> > > > > >> > it possible to enforce following rules:
>> > > > > >> > - Only return results above a certain threshold when using
>> GROUP
>> > > BY,
>> > > > > for
>> > > > > >> > example (HAVING COUNT(col1) > 100).
>> > > > > >> > - Restrict the column on which joins can happen, or else
>> throw
>> > an
>> > > > > error
>> > > > > >> > (almost like analysisException in Spark)
>> > > > > >> > - Restrict the columns that can be passed in a select
>> statement
>> > or
>> > > > > else
>> > > > > >> > throw an error (like in the point above)
>> > > > > >> >
>> > > > > >> > Not sure if it is feasible, please feel free to suggest. :)
>> > Thanks
>> > > > > again
>> > > > > >> > for your time!
>> > > > > >> >
>> > > > > >> > With regards,
>> > > > > >> > Soumyadeep Mukhopadhyay.
>> > > > > >> >
>> > > > > >> >
>> > > > > >> > On Sun, 2 Apr 2023 at 8:59 PM, Oscar Mulin <
>> > [email protected]
>> > > >
>> > > > > >> wrote:
>> > > > > >> >
>> > > > > >> > > Hey Soumyadeep,
>> > > > > >> > > I think that can work with a few caveats.
>> > > > > >> > > 0) Use the Planner from Frameworks
>> > > > > >> > > <
>> > > > > >> > >
>> > > > > >> >
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/Frameworks.html
>> > > > > >> > > >
>> > > > > >> > > 1) Parse the “template query" into a syntax tree (a root
>> > > SqlNode)
>> > > > > but
>> > > > > >> do
>> > > > > >> > > not validate it
>> > > > > >> > > 2) Cast the root SqlNode into a SqlSelect
>> > > > > >> > > <
>> > > > > >> > >
>> > > > > >> >
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSelect.html
>> > > > > >> > > >
>> > > > > >> > > (can
>> > > > > >> > > verify that it's a SELECT by using getKind())
>> > > > > >> > > 3) Use the setters from SqlSelect to modify the children of
>> > root
>> > > > > >> > SqlSelect
>> > > > > >> > > node.
>> > > > > >> > > 4) Unparse the root query back into a string
>> > > (SqlNode.unparse()).
>> > > > > >> > >
>> > > > > >> > > Now the tricky part is 3), as you have to set proper
>> SqlNode
>> > > types
>> > > > > as
>> > > > > >> > > children of the SELECT node for column names, table names
>> and
>> > > for
>> > > > > >> WHERE
>> > > > > >> > > predicate. I don't remember them properly, but what you
>> can do
>> > > is
>> > > > > >> parse a
>> > > > > >> > > proper query (again, don't validate it as you'll then need
>> > table
>> > > > > >> > metadata),
>> > > > > >> > > and check the kinds (getKind()) of children SqlNode's and
>> > > replace
>> > > > > them
>> > > > > >> > with
>> > > > > >> > > your replacements using setters of root SqlSelect node.
>> > > > > >> > >
>> > > > > >> > > This should work, but I'm not 100% certain as I'm unable to
>> > > check
>> > > > > >> right
>> > > > > >> > > now.
>> > > > > >> > >
>> > > > > >> > > PS: Take care about dialects. Dialects are used in
>> unparsing
>> > as
>> > > a
>> > > > > >> > > "configuration" of SqlWriter, and can unparse the syntax
>> tree
>> > > > > >> differently
>> > > > > >> > > based on the dialect you have chosen.
>> > > > > >> > >
>> > > > > >> > > Kind regards,
>> > > > > >> > > Askar Bozcan
>> > > > > >> > >
>> > > > > >> > > On 2 Apr 2023, at 14:56, Soumyadeep Mukhopadhyay <
>> > > > > >> [email protected]>
>> > > > > >> > > wrote:
>> > > > > >> > >
>> > > > > >> > > Hello All,
>> > > > > >> > >
>> > > > > >> > > I have just heard of Apache Calcite and was exploring the
>> > > > > >> possibilities.
>> > > > > >> > I
>> > > > > >> > > wish to achieve the following, and wanted to check if my
>> hunch
>> > > is
>> > > > > >> > correct:
>> > > > > >> > > - Use a template to build SQL queries, like use jinja-sql
>> or
>> > > even
>> > > > > >> pebble
>> > > > > >> > > (interpret the SqlNode tree kind of structure from my
>> template
>> > > and
>> > > > > >> then
>> > > > > >> > add
>> > > > > >> > > the necessary fields like table name and group by fields
>> from
>> > an
>> > > > > >> input)
>> > > > > >> > >
>> > > > > >> > > So what I am expecting is "SELECT ? FROM ? WHERE ?" would
>> be
>> > > > inside
>> > > > > a
>> > > > > >> > query
>> > > > > >> > > template (in Jinja-sql it may look like "SELECT
>> > {{select_fields
>> > > |
>> > > > > >> > sqlsafe}}
>> > > > > >> > > FROM {{table_name | sqlsafe}} WHERE {{where_clause |
>> > sqlsafe}}"
>> > > > and
>> > > > > >> > values
>> > > > > >> > > like 'select_fields' would be substituted at run-time from
>> a
>> > > > > >> processing
>> > > > > >> > > engine like https://pypi.org/project/Jinja2/, but for
>> Java)
>> > and
>> > > > the
>> > > > > >> > output
>> > > > > >> > > would be "SELECT col1 FROM table1 WHERE col1 IS NOT NULL"
>> with
>> > > > some
>> > > > > >> > dialect
>> > > > > >> > > (like Snowflake or Big-Query).
>> > > > > >> > >
>> > > > > >> > > Is this possible? Any recommendations or suggestions are
>> > > welcome.
>> > > > > >> Even if
>> > > > > >> > > the approach feels wrong please let me know. :)
>> > > > > >> > > Thank you for your time and consideration.
>> > > > > >> > >
>> > > > > >> > > With regards,
>> > > > > >> > > Soumyadeep Mukhopadhyay.
>> > > > > >> > >
>> > > > > >> >
>> > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to