Oh, true. It can parse CREATE TABLE however.
CREATE TABLE is pretty much the only DDL I needed to parse, so I didn't
really notice this.
Thanks for the correction!

- Askar




On Fri, 23 Jun 2023 at 19:43, <[email protected]> wrote:

> From my experience the Babel parser does NOT include DDL - at least not in
> the released versions.
>
> Mihai
>
> -----Original Message-----
> From: Askar Bozcan
> Sent: Friday, June 23, 2023 1:14 AM
> To: [email protected]
> Subject: Re: Calcite for Lineage
>
> Hey Nathaniel,
> To parse DDL statements, you need to use a different parser (see
> SqlParser.Config.withParserFactory
> <
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/parser/SqlParser.Config.html#withParserFactory(org.apache.calcite.sql.parser.SqlParserImplFactory)
> >)
> because core Calcite parser
> <
> https://github.com/apache/calcite/blob/main/core/src/main/codegen/templates/Parser.jj
> >
> does
> not support DDL statements by design. You have two options:
> 1) Use SqlDdlParserImpl.FACTORY (from calcite-server package, as you've
> said)
> This is an extended parser that can also parse DDL queries.
>
> 2)  Use SqlBabelParserImpl.FACTORY (from calcite-babel package) This is
> also an extended parser that can not only parse DDL queries, but also many
> extra things not present in ISO SQL standard, such as Postgre's infix CAST
> operator *::* Since you're doing lineage generation I highly recommend
> using babel package for maximum compatibility with different DBs' queries.
>
> *Mini-explanation on how parsing works in Calcite* Try Go To'ing to
> SqlParserImpl code in your IDE. You're going to see a huge file, full of
> almost nonsensical if's.
> The reason is simple: Parser code is build-time generated Java code based
> on rules defined in core Calcite parser <
> https://github.com/apache/calcite/blob/main/core/src/main/codegen/templates/Parser.jj
> >.
> Parser.jj is a JavaCC file; JavaCC <https://javacc.github.io/javacc/> is
> a parser generator that, based on rules you define, generates a pure Java
> code which can parse LL(k) grammars and generate the parse tree (SqlNode).
>
> Parser.jj, as seen on the repo, is not a pure JavaCC file however, but an
> Apache FreeMarker template. There are strings in the Parser.jj file that
> start with *${ .* Those are placeholders used by Apache FreeMarker, a
> templating engine.
> Extended parsers (babel, ddlparser) use those placeholders to insert their
> custom parsing rules without directly affecting the core parser file (it
> all still happens build-time, however).
> How Babel parser does it, for example:
> https://github.com/apache/calcite/blob/main/babel/src/main/codegen
>
> *An advice for lineage generation*
> If you're going to create a lineage generator, I highly recommend using a
> relational tree (RelNode tree) instead of parse tree (SqlNode) if you have
> access to DB tables.
> After all, a lineage shows the relation between tables/columns, and so
> does a relational tree. There is even a built-in method for lineage:
> getExpressionLineage
> <
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/metadata/RelMetadataQuery.html#getExpressionLineage(org.apache.calcite.rel.RelNode,org.apache.calcite.rex.RexNode)
> >
>
> Another advice, if you're planning to support different kinds of DBs, you
> will eventually run into something unparseable, and that will require the
> extension of the core Parser/babel Parser. Since it's build time, I suggest
> you submitting a PR to extend the parsers.
>
> Good luck,
> Askar
>
>
>
> On 23 Jun 2023 Fri at 07:37 Nathaniel Vala <[email protected]>
> wrote:
>
> > Hi All,
> >
> > I have been trying to build a java tool that would let people map
> > lineage by reading sql scripts (i.e. views or insert into etc.) Im
> > having a little trouble with a couple of things and was hoping for some
> pointers.
> >
> > Firstly, I cant seem to parse any DDL statements (so trouble with
> > `CREATE VIEW AS [SQL QUERY]`).I understand this is meant to be in the
> > calcite-server module but cant really find anything).
> > I decided to ignore the CREATE statements for the moment and just
> > process the query to get the sources in it which was working on simple
> > scripts but fail real quick when looking at things I've seen at
> enterprises.
> >
> > I have a GitHub repo here<https://github.com/Spydernaz/sqlLineage>,
> > some pointers would be awesome.
> >
> > Kind Regards,
> > Nathaniel Vala
> >
>
>

Reply via email to