RE: SQL Parsing

mbudiu Mon, 02 Oct 2023 14:43:27 -0700

Maybe I will write a blog post about this (I couldn't find documentation about 
this with a web search), but for now this is a short summary. 
Hopefully nothing I say here is wrong, this is relatively new stuff for me too.

Calcite uses a two-step code generation process to create the parser:

- it starts with Parser.jj 
https://github.com/apache/calcite/blob/6f79436c178beec639e559d9152c237bbf8ec3e8/core/src/main/codegen/templates/Parser.jj.
This is the base parser written in the JavaCC language, plus instructions in 
template language which allows for extensibility.
The lines starting with "<#", e.g. 
https://github.com/apache/calcite/blob/6f79436c178beec639e559d9152c237bbf8ec3e8/core/src/main/codegen/templates/Parser.jj#L32

are for the FreeMarker template engine: 
https://freemarker.apache.org/docs/index.html

- it runs FreeMarker to process Parser.jj; the template engine is controlled by 
config.fmpp 
https://github.com/apache/calcite/blob/6f79436c178beec639e559d9152c237bbf8ec3e8/core/src/main/codegen/config.fmpp
and *.ftl files 
https://github.com/apache/calcite/blob/6f79436c178beec639e559d9152c237bbf8ec3e8/babel/src/main/codegen/includes/parserImpls.ftl

The config file controls the way extensions are applied, and the ftl files 
contain the code for the extensions. The ftl files are really in the JavaCC 
language, but they are only code fragments.

- the JavaCC parser generator is executed on the generated file to generate a 
Java file, e.g., SqlBabelParserImpl.java (the actual name comes from the 
configuration file).
This file is not part of the repository, but you can find it after you run the 
Calcite build process. This file contains the real parser which gets compiled 
by the Java compiler and linked in the resulting Jar.

That's why you need to involve the build system to customize the parser, you 
can't do it in pure Java. In our Calcite-based project the build system uses 
maven, but calcite itself uses gradle; the way you setup this workflow will 
depend on your tools.

You can have one or multiple parsers in your executable, depending on which 
Calcite jars you choose to include. The server jar contains the DDL SQL 
language, while Babel has a very lenient parser which accepts constructs from 
many dialects. The core parser only has the SQL *query* language. In our 
project we have combined Babel + DDL in one parser: 
https://github.com/feldera/feldera/pull/276

In general the parser extensions are fairly small compared to the original 
parser, and they should be easy to read and understand if you know how parser 
generators work.

As you can see, the original JJ file has some predefined extensibility points; 
if your desired extensions only need these, it should be easy to add them. I 
suspect adding a new syntax for generic types should be doable in the existing 
framework.

Mihai

-----Original Message-----
From: Thomas Wang <w...@datability.io> 
Sent: Sunday, October 1, 2023 7:37 AM
To: dev@calcite.apache.org
Subject: Re: SQL Parsing

Hi Mihai,

Could you point me to more information w.r.t modifying the parser? Also, I 
noticed for the ARRAY type, I can do CAST(NULL AS BIGINT ARRAY), but the 
grammar doesn't seem to have MAP supported, does it? Did I miss something?
Thanks.

Thomas

On Sat, Sep 30, 2023 at 7:59 PM Mihai Budiu <mbu...@gmail.com> wrote:

> Calcite is open source, so you can certainly modify the parser. The 
> architecture of the parser had been designed to be flexible and even 
> calcite has several parsers, the strandard one, server, and Babel. Let 
> me know if you need help figuring out how to roll your own.
>
> Mihai
> ________________________________
> From: Thomas Wang <w...@datability.io>
> Sent: Saturday, September 30, 2023 7:42:19 PM
> To: dev@calcite.apache.org <dev@calcite.apache.org>
> Subject: Re: SQL Parsing
>
> Thanks Mihai, "BIGINT ARRAY" seems to work! Just curious if I can have 
> flexibility to change the behavior of the parse to parse "ARRAY<BIGINT>"
> instead?
>
> Thomas
>
> On Sat, Sep 30, 2023 at 6:32 PM Mihai Budiu <mbu...@gmail.com> wrote:
>
> > The syntax is "bigint array"
> > https://calcite.apache.org/docs/reference.html
> > ________________________________
> > From: Thomas Wang <w...@datability.io>
> > Sent: Saturday, September 30, 2023 6:27:46 PM
> > To: dev@calcite.apache.org <dev@calcite.apache.org>
> > Subject: SQL Parsing
> >
> > Hi Apache Calcite Community,
> >
> > I'm new to Apache Calcite and trying to use it for SQL parsing and 
> > SQL rewriting.
> >
> > I tried to parse a couple of SELECT statements and it works pretty well.
> > However, when I tried to parse a SELECT statement that contains a 
> > cast to array like below, it complains it cannot recognize ARRAY.
> >
> > SELECT CAST(NULL AS ARRAY<BIGINT>) FROM schema.t1
> >
> > However, casting to BIGINT seems ok. The following SELECT is ok.
> >
> > SELECT CAST(NULL AS BIGINT) FROM schema.t1
> >
> > Is there any configuration I need to enable to enable/support 
> > parsing ARRAY? Thanks.
> >
> > Thomas
> >
>

RE: SQL Parsing

Reply via email to