Which parser are you using, the server (DDL) or Babel? I think DATEADD exists only in Babel. To solve this problem in our project we have combined both parsers into one, so now we can parse Babel + DDL. https://github.com/feldera/dbsp/pull/276
Mihai -----Original Message----- From: Nathaniel Vala Sent: Thursday, July 06, 2023 9:13 PM To: [email protected] Subject: RE: Re: Calcite for Lineage Hi Askar and Mihai, Thanks for the help. I have modified the code to use a DDL parser, Minhai was right about the Babel one not having all the DDL statements in it. I have managed to parse this but still falling over at token that should parse. When parsing queries using `SqlDdlParserImpl.FACTORY` and `Lex.SQL_SERVER` I try to parse a simple DATEADD function and it's failing. ```sql SELECT DATEADD(year, 1, '2017/08/25') AS DateAdd FROM unienrollements t1 ``` Java throws an `InvocationTargetException` near 'YEAR' and that it was expecting <some list here> However in that list, it has YEAR and YEAR (. To me, this should be able to parse, have I missed something in the config to enable this? If you need some code, this can be found here: https://github.com/Spydernaz/sqlLineage Furthermore, thanks for the suggestion regarding the RelMetadata but for the life of me cannot config it. Seems like I need to have a full connection established and fetch live schemas for it to validate against? This isn't really an option for me at the moment. I will upload the RelMetadata attempt in a separate branch of this repo On 2023/06/23 17:12:19 Askar Bozcan wrote: > Oh, true. It can parse CREATE TABLE however. > CREATE TABLE is pretty much the only DDL I needed to parse, so I > didn't really notice this. > Thanks for the correction! > > - Askar > > > > > On Fri, 23 Jun 2023 at 19:43, <[email protected]<mailto:[email protected]>> wrote: > > > From my experience the Babel parser does NOT include DDL - at least > > not in the released versions. > > > > Mihai > > > > -----Original Message----- > > From: Askar Bozcan > > Sent: Friday, June 23, 2023 1:14 AM > > To: [email protected]<mailto:[email protected]> > > Subject: Re: Calcite for Lineage > > > > Hey Nathaniel, > > To parse DDL statements, you need to use a different parser (see > > SqlParser.Config.withParserFactory > > < > > https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/p > > arser/SqlParser.Config.html#withParserFactory(org.apache.calcite.sql > > .parser.SqlParserImplFactory) > > >) > > because core Calcite parser > > < > > https://github.com/apache/calcite/blob/main/core/src/main/codegen/te > > mplates/Parser.jj > > > > > does > > not support DDL statements by design. You have two options: > > 1) Use SqlDdlParserImpl.FACTORY (from calcite-server package, as > > you've > > said) > > This is an extended parser that can also parse DDL queries. > > > > 2) Use SqlBabelParserImpl.FACTORY (from calcite-babel package) This > > is also an extended parser that can not only parse DDL queries, but > > also many extra things not present in ISO SQL standard, such as > > Postgre's infix CAST operator *::* Since you're doing lineage > > generation I highly recommend using babel package for maximum compatibility with different DBs' queries. > > > > *Mini-explanation on how parsing works in Calcite* Try Go To'ing to > > SqlParserImpl code in your IDE. You're going to see a huge file, > > full of almost nonsensical if's. > > The reason is simple: Parser code is build-time generated Java code > > based on rules defined in core Calcite parser < > > https://github.com/apache/calcite/blob/main/core/src/main/codegen/te > > mplates/Parser.jj > > >. > > Parser.jj is a JavaCC file; JavaCC > > <https://javacc.github.io/javacc/><https://javacc.github.io/javacc/% > > 3e> is a parser generator that, based on rules you define, generates a pure Java code which can parse LL(k) grammars and generate the parse tree (SqlNode). > > > > Parser.jj, as seen on the repo, is not a pure JavaCC file however, > > but an Apache FreeMarker template. There are strings in the > > Parser.jj file that start with *${ .* Those are placeholders used by > > Apache FreeMarker, a templating engine. > > Extended parsers (babel, ddlparser) use those placeholders to insert > > their custom parsing rules without directly affecting the core > > parser file (it all still happens build-time, however). > > How Babel parser does it, for example: > > https://github.com/apache/calcite/blob/main/babel/src/main/codegen > > > > *An advice for lineage generation* > > If you're going to create a lineage generator, I highly recommend > > using a relational tree (RelNode tree) instead of parse tree > > (SqlNode) if you have access to DB tables. > > After all, a lineage shows the relation between tables/columns, and > > so does a relational tree. There is even a built-in method for lineage: > > getExpressionLineage > > < > > https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/m > > etadata/RelMetadataQuery.html#getExpressionLineage(org.apache.calcit > > e.rel.RelNode,org.apache.calcite.rex.RexNode) > > > > > > > Another advice, if you're planning to support different kinds of > > DBs, you will eventually run into something unparseable, and that > > will require the extension of the core Parser/babel Parser. Since > > it's build time, I suggest you submitting a PR to extend the parsers. > > > > Good luck, > > Askar > > > > > > > > On 23 Jun 2023 Fri at 07:37 Nathaniel Vala > > <[email protected]<mailto:[email protected]>> > > wrote: > > > > > Hi All, > > > > > > I have been trying to build a java tool that would let people map > > > lineage by reading sql scripts (i.e. views or insert into etc.) Im > > > having a little trouble with a couple of things and was hoping for > > > some > > pointers. > > > > > > Firstly, I cant seem to parse any DDL statements (so trouble with > > > `CREATE VIEW AS [SQL QUERY]`).I understand this is meant to be in > > > the calcite-server module but cant really find anything). > > > I decided to ignore the CREATE statements for the moment and just > > > process the query to get the sources in it which was working on > > > simple scripts but fail real quick when looking at things I've > > > seen at > > enterprises. > > > > > > I have a GitHub repo > > > here<https://github.com/Spydernaz/sqlLineage><https://github.com/S > > > pydernaz/sqlLineage%3e>, > > > some pointers would be awesome. > > > > > > Kind Regards, > > > Nathaniel Vala > > > > > > > >
