Which parser are you using, the server (DDL) or Babel?
I think DATEADD exists only in Babel.
To solve this problem in our project we have combined both parsers into one,
so now we can parse Babel + DDL.
https://github.com/feldera/dbsp/pull/276

Mihai

-----Original Message-----
From: Nathaniel Vala 
Sent: Thursday, July 06, 2023 9:13 PM
To: [email protected]
Subject: RE: Re: Calcite for Lineage

Hi Askar and Mihai,

Thanks for the help. I have modified the code to use a DDL parser, Minhai
was right about the Babel one not having all the DDL statements in it. I
have managed to parse this but still falling over at token that should
parse.

When parsing queries using `SqlDdlParserImpl.FACTORY` and `Lex.SQL_SERVER` I
try to parse a simple DATEADD function and it's failing.

```sql
SELECT  DATEADD(year, 1, '2017/08/25') AS DateAdd FROM unienrollements t1
```

Java throws an `InvocationTargetException` near 'YEAR' and that it was
expecting <some list here> However in that list, it has YEAR and YEAR (. To
me, this should be able to parse, have I missed something in the config to
enable this?

If you need some code, this can be found here:
https://github.com/Spydernaz/sqlLineage


Furthermore, thanks for the suggestion regarding the RelMetadata but for the
life of me cannot config it. Seems like I need to have a full connection
established and fetch live schemas for it to validate against? This isn't
really an option for me at the moment. I will upload the RelMetadata attempt
in a separate branch of this repo

On 2023/06/23 17:12:19 Askar Bozcan wrote:
> Oh, true. It can parse CREATE TABLE however.
> CREATE TABLE is pretty much the only DDL I needed to parse, so I 
> didn't really notice this.
> Thanks for the correction!
>
> - Askar
>
>
>
>
> On Fri, 23 Jun 2023 at 19:43, <[email protected]<mailto:[email protected]>>
wrote:
>
> > From my experience the Babel parser does NOT include DDL - at least 
> > not in the released versions.
> >
> > Mihai
> >
> > -----Original Message-----
> > From: Askar Bozcan
> > Sent: Friday, June 23, 2023 1:14 AM
> > To: [email protected]<mailto:[email protected]>
> > Subject: Re: Calcite for Lineage
> >
> > Hey Nathaniel,
> > To parse DDL statements, you need to use a different parser (see 
> > SqlParser.Config.withParserFactory
> > <
> > https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/p
> > arser/SqlParser.Config.html#withParserFactory(org.apache.calcite.sql
> > .parser.SqlParserImplFactory)
> > >)
> > because core Calcite parser
> > <
> > https://github.com/apache/calcite/blob/main/core/src/main/codegen/te
> > mplates/Parser.jj
> > >
> > does
> > not support DDL statements by design. You have two options:
> > 1) Use SqlDdlParserImpl.FACTORY (from calcite-server package, as 
> > you've
> > said)
> > This is an extended parser that can also parse DDL queries.
> >
> > 2)  Use SqlBabelParserImpl.FACTORY (from calcite-babel package) This 
> > is also an extended parser that can not only parse DDL queries, but 
> > also many extra things not present in ISO SQL standard, such as 
> > Postgre's infix CAST operator *::* Since you're doing lineage 
> > generation I highly recommend using babel package for maximum
compatibility with different DBs' queries.
> >
> > *Mini-explanation on how parsing works in Calcite* Try Go To'ing to 
> > SqlParserImpl code in your IDE. You're going to see a huge file, 
> > full of almost nonsensical if's.
> > The reason is simple: Parser code is build-time generated Java code 
> > based on rules defined in core Calcite parser < 
> > https://github.com/apache/calcite/blob/main/core/src/main/codegen/te
> > mplates/Parser.jj
> > >.
> > Parser.jj is a JavaCC file; JavaCC 
> > <https://javacc.github.io/javacc/><https://javacc.github.io/javacc/%
> > 3e> is a parser generator that, based on rules you define, generates a
pure Java code which can parse LL(k) grammars and generate the parse tree
(SqlNode).
> >
> > Parser.jj, as seen on the repo, is not a pure JavaCC file however, 
> > but an Apache FreeMarker template. There are strings in the 
> > Parser.jj file that start with *${ .* Those are placeholders used by 
> > Apache FreeMarker, a templating engine.
> > Extended parsers (babel, ddlparser) use those placeholders to insert 
> > their custom parsing rules without directly affecting the core 
> > parser file (it all still happens build-time, however).
> > How Babel parser does it, for example:
> > https://github.com/apache/calcite/blob/main/babel/src/main/codegen
> >
> > *An advice for lineage generation*
> > If you're going to create a lineage generator, I highly recommend 
> > using a relational tree (RelNode tree) instead of parse tree 
> > (SqlNode) if you have access to DB tables.
> > After all, a lineage shows the relation between tables/columns, and 
> > so does a relational tree. There is even a built-in method for lineage:
> > getExpressionLineage
> > <
> > https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/m
> > etadata/RelMetadataQuery.html#getExpressionLineage(org.apache.calcit
> > e.rel.RelNode,org.apache.calcite.rex.RexNode)
> > >
> >
> > Another advice, if you're planning to support different kinds of 
> > DBs, you will eventually run into something unparseable, and that 
> > will require the extension of the core Parser/babel Parser. Since 
> > it's build time, I suggest you submitting a PR to extend the parsers.
> >
> > Good luck,
> > Askar
> >
> >
> >
> > On 23 Jun 2023 Fri at 07:37 Nathaniel Vala 
> > <[email protected]<mailto:[email protected]>>
> > wrote:
> >
> > > Hi All,
> > >
> > > I have been trying to build a java tool that would let people map 
> > > lineage by reading sql scripts (i.e. views or insert into etc.) Im 
> > > having a little trouble with a couple of things and was hoping for 
> > > some
> > pointers.
> > >
> > > Firstly, I cant seem to parse any DDL statements (so trouble with 
> > > `CREATE VIEW AS [SQL QUERY]`).I understand this is meant to be in 
> > > the calcite-server module but cant really find anything).
> > > I decided to ignore the CREATE statements for the moment and just 
> > > process the query to get the sources in it which was working on 
> > > simple scripts but fail real quick when looking at things I've 
> > > seen at
> > enterprises.
> > >
> > > I have a GitHub repo 
> > > here<https://github.com/Spydernaz/sqlLineage><https://github.com/S
> > > pydernaz/sqlLineage%3e>,
> > > some pointers would be awesome.
> > >
> > > Kind Regards,
> > > Nathaniel Vala
> > >
> >
> >
>

Reply via email to