>From my experience the Babel parser does NOT include DDL - at least not in the 
>released versions.

Mihai

-----Original Message-----
From: Askar Bozcan 
Sent: Friday, June 23, 2023 1:14 AM
To: [email protected]
Subject: Re: Calcite for Lineage

Hey Nathaniel,
To parse DDL statements, you need to use a different parser (see 
SqlParser.Config.withParserFactory
<https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/parser/SqlParser.Config.html#withParserFactory(org.apache.calcite.sql.parser.SqlParserImplFactory)>)
because core Calcite parser
<https://github.com/apache/calcite/blob/main/core/src/main/codegen/templates/Parser.jj>
does
not support DDL statements by design. You have two options:
1) Use SqlDdlParserImpl.FACTORY (from calcite-server package, as you've
said)
This is an extended parser that can also parse DDL queries.

2)  Use SqlBabelParserImpl.FACTORY (from calcite-babel package) This is also an 
extended parser that can not only parse DDL queries, but also many extra things 
not present in ISO SQL standard, such as Postgre's infix CAST operator *::* 
Since you're doing lineage generation I highly recommend using babel package 
for maximum compatibility with different DBs' queries.

*Mini-explanation on how parsing works in Calcite* Try Go To'ing to 
SqlParserImpl code in your IDE. You're going to see a huge file, full of almost 
nonsensical if's.
The reason is simple: Parser code is build-time generated Java code based on 
rules defined in core Calcite parser 
<https://github.com/apache/calcite/blob/main/core/src/main/codegen/templates/Parser.jj>.
Parser.jj is a JavaCC file; JavaCC <https://javacc.github.io/javacc/> is a 
parser generator that, based on rules you define, generates a pure Java code 
which can parse LL(k) grammars and generate the parse tree (SqlNode).

Parser.jj, as seen on the repo, is not a pure JavaCC file however, but an 
Apache FreeMarker template. There are strings in the Parser.jj file that start 
with *${ .* Those are placeholders used by Apache FreeMarker, a templating 
engine.
Extended parsers (babel, ddlparser) use those placeholders to insert their 
custom parsing rules without directly affecting the core parser file (it all 
still happens build-time, however).
How Babel parser does it, for example:
https://github.com/apache/calcite/blob/main/babel/src/main/codegen

*An advice for lineage generation*
If you're going to create a lineage generator, I highly recommend using a 
relational tree (RelNode tree) instead of parse tree (SqlNode) if you have 
access to DB tables.
After all, a lineage shows the relation between tables/columns, and so does a 
relational tree. There is even a built-in method for lineage:
getExpressionLineage
<https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/metadata/RelMetadataQuery.html#getExpressionLineage(org.apache.calcite.rel.RelNode,org.apache.calcite.rex.RexNode)>

Another advice, if you're planning to support different kinds of DBs, you will 
eventually run into something unparseable, and that will require the extension 
of the core Parser/babel Parser. Since it's build time, I suggest you 
submitting a PR to extend the parsers.

Good luck,
Askar



On 23 Jun 2023 Fri at 07:37 Nathaniel Vala <[email protected]>
wrote:

> Hi All,
>
> I have been trying to build a java tool that would let people map 
> lineage by reading sql scripts (i.e. views or insert into etc.) Im 
> having a little trouble with a couple of things and was hoping for some 
> pointers.
>
> Firstly, I cant seem to parse any DDL statements (so trouble with 
> `CREATE VIEW AS [SQL QUERY]`).I understand this is meant to be in the 
> calcite-server module but cant really find anything).
> I decided to ignore the CREATE statements for the moment and just 
> process the query to get the sources in it which was working on simple 
> scripts but fail real quick when looking at things I've seen at enterprises.
>
> I have a GitHub repo here<https://github.com/Spydernaz/sqlLineage>, 
> some pointers would be awesome.
>
> Kind Regards,
> Nathaniel Vala
>

Reply via email to