Also, the YEAR expression is not the only one I am having issues with, for 
example there is also CONVERT, “Quoted Column Names” etc. It feels like I must 
have missed a step somewhere or need to define something in the Parser.

Worst case, how would I be able to extend some of these functions?

Kind Regards,
Nathaniel Vala


On 2023/06/23 17:12:19 Askar Bozcan wrote:
> Oh, true. It can parse CREATE TABLE however.
> CREATE TABLE is pretty much the only DDL I needed to parse, so I didn't
> really notice this.
> Thanks for the correction!
>
> - Askar
>
>
>
>
> On Fri, 23 Jun 2023 at 19:43, <[email protected]<mailto:[email protected]>> wrote:
>
> > From my experience the Babel parser does NOT include DDL - at least not in
> > the released versions.
> >
> > Mihai
> >
> > -----Original Message-----
> > From: Askar Bozcan
> > Sent: Friday, June 23, 2023 1:14 AM
> > To: [email protected]<mailto:[email protected]>
> > Subject: Re: Calcite for Lineage
> >
> > Hey Nathaniel,
> > To parse DDL statements, you need to use a different parser (see
> > SqlParser.Config.withParserFactory
> > <
> > https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/parser/SqlParser.Config.html#withParserFactory(org.apache.calcite.sql.parser.SqlParserImplFactory)
> > >)
> > because core Calcite parser
> > <
> > https://github.com/apache/calcite/blob/main/core/src/main/codegen/templates/Parser.jj
> > >
> > does
> > not support DDL statements by design. You have two options:
> > 1) Use SqlDdlParserImpl.FACTORY (from calcite-server package, as you've
> > said)
> > This is an extended parser that can also parse DDL queries.
> >
> > 2)  Use SqlBabelParserImpl.FACTORY (from calcite-babel package) This is
> > also an extended parser that can not only parse DDL queries, but also many
> > extra things not present in ISO SQL standard, such as Postgre's infix CAST
> > operator *::* Since you're doing lineage generation I highly recommend
> > using babel package for maximum compatibility with different DBs' queries.
> >
> > *Mini-explanation on how parsing works in Calcite* Try Go To'ing to
> > SqlParserImpl code in your IDE. You're going to see a huge file, full of
> > almost nonsensical if's.
> > The reason is simple: Parser code is build-time generated Java code based
> > on rules defined in core Calcite parser <
> > https://github.com/apache/calcite/blob/main/core/src/main/codegen/templates/Parser.jj
> > >.
> > Parser.jj is a JavaCC file; JavaCC 
> > <https://javacc.github.io/javacc/><https://javacc.github.io/javacc/%3e> is
> > a parser generator that, based on rules you define, generates a pure Java
> > code which can parse LL(k) grammars and generate the parse tree (SqlNode).
> >
> > Parser.jj, as seen on the repo, is not a pure JavaCC file however, but an
> > Apache FreeMarker template. There are strings in the Parser.jj file that
> > start with *${ .* Those are placeholders used by Apache FreeMarker, a
> > templating engine.
> > Extended parsers (babel, ddlparser) use those placeholders to insert their
> > custom parsing rules without directly affecting the core parser file (it
> > all still happens build-time, however).
> > How Babel parser does it, for example:
> > https://github.com/apache/calcite/blob/main/babel/src/main/codegen
> >
> > *An advice for lineage generation*
> > If you're going to create a lineage generator, I highly recommend using a
> > relational tree (RelNode tree) instead of parse tree (SqlNode) if you have
> > access to DB tables.
> > After all, a lineage shows the relation between tables/columns, and so
> > does a relational tree. There is even a built-in method for lineage:
> > getExpressionLineage
> > <
> > https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/metadata/RelMetadataQuery.html#getExpressionLineage(org.apache.calcite.rel.RelNode,org.apache.calcite.rex.RexNode)
> > >
> >
> > Another advice, if you're planning to support different kinds of DBs, you
> > will eventually run into something unparseable, and that will require the
> > extension of the core Parser/babel Parser. Since it's build time, I suggest
> > you submitting a PR to extend the parsers.
> >
> > Good luck,
> > Askar
> >
> >
> >
> > On 23 Jun 2023 Fri at 07:37 Nathaniel Vala 
> > <[email protected]<mailto:[email protected]>>
> > wrote:
> >
> > > Hi All,
> > >
> > > I have been trying to build a java tool that would let people map
> > > lineage by reading sql scripts (i.e. views or insert into etc.) Im
> > > having a little trouble with a couple of things and was hoping for some
> > pointers.
> > >
> > > Firstly, I cant seem to parse any DDL statements (so trouble with
> > > `CREATE VIEW AS [SQL QUERY]`).I understand this is meant to be in the
> > > calcite-server module but cant really find anything).
> > > I decided to ignore the CREATE statements for the moment and just
> > > process the query to get the sources in it which was working on simple
> > > scripts but fail real quick when looking at things I've seen at
> > enterprises.
> > >
> > > I have a GitHub repo 
> > > here<https://github.com/Spydernaz/sqlLineage><https://github.com/Spydernaz/sqlLineage%3e>,
> > > some pointers would be awesome.
> > >
> > > Kind Regards,
> > > Nathaniel Vala
> > >
> >
> >
>

Reply via email to