Oh, true. It can parse CREATE TABLE however. CREATE TABLE is pretty much the only DDL I needed to parse, so I didn't really notice this. Thanks for the correction!
- Askar On Fri, 23 Jun 2023 at 19:43, <[email protected]> wrote: > From my experience the Babel parser does NOT include DDL - at least not in > the released versions. > > Mihai > > -----Original Message----- > From: Askar Bozcan > Sent: Friday, June 23, 2023 1:14 AM > To: [email protected] > Subject: Re: Calcite for Lineage > > Hey Nathaniel, > To parse DDL statements, you need to use a different parser (see > SqlParser.Config.withParserFactory > < > https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/parser/SqlParser.Config.html#withParserFactory(org.apache.calcite.sql.parser.SqlParserImplFactory) > >) > because core Calcite parser > < > https://github.com/apache/calcite/blob/main/core/src/main/codegen/templates/Parser.jj > > > does > not support DDL statements by design. You have two options: > 1) Use SqlDdlParserImpl.FACTORY (from calcite-server package, as you've > said) > This is an extended parser that can also parse DDL queries. > > 2) Use SqlBabelParserImpl.FACTORY (from calcite-babel package) This is > also an extended parser that can not only parse DDL queries, but also many > extra things not present in ISO SQL standard, such as Postgre's infix CAST > operator *::* Since you're doing lineage generation I highly recommend > using babel package for maximum compatibility with different DBs' queries. > > *Mini-explanation on how parsing works in Calcite* Try Go To'ing to > SqlParserImpl code in your IDE. You're going to see a huge file, full of > almost nonsensical if's. > The reason is simple: Parser code is build-time generated Java code based > on rules defined in core Calcite parser < > https://github.com/apache/calcite/blob/main/core/src/main/codegen/templates/Parser.jj > >. > Parser.jj is a JavaCC file; JavaCC <https://javacc.github.io/javacc/> is > a parser generator that, based on rules you define, generates a pure Java > code which can parse LL(k) grammars and generate the parse tree (SqlNode). > > Parser.jj, as seen on the repo, is not a pure JavaCC file however, but an > Apache FreeMarker template. There are strings in the Parser.jj file that > start with *${ .* Those are placeholders used by Apache FreeMarker, a > templating engine. > Extended parsers (babel, ddlparser) use those placeholders to insert their > custom parsing rules without directly affecting the core parser file (it > all still happens build-time, however). > How Babel parser does it, for example: > https://github.com/apache/calcite/blob/main/babel/src/main/codegen > > *An advice for lineage generation* > If you're going to create a lineage generator, I highly recommend using a > relational tree (RelNode tree) instead of parse tree (SqlNode) if you have > access to DB tables. > After all, a lineage shows the relation between tables/columns, and so > does a relational tree. There is even a built-in method for lineage: > getExpressionLineage > < > https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/metadata/RelMetadataQuery.html#getExpressionLineage(org.apache.calcite.rel.RelNode,org.apache.calcite.rex.RexNode) > > > > Another advice, if you're planning to support different kinds of DBs, you > will eventually run into something unparseable, and that will require the > extension of the core Parser/babel Parser. Since it's build time, I suggest > you submitting a PR to extend the parsers. > > Good luck, > Askar > > > > On 23 Jun 2023 Fri at 07:37 Nathaniel Vala <[email protected]> > wrote: > > > Hi All, > > > > I have been trying to build a java tool that would let people map > > lineage by reading sql scripts (i.e. views or insert into etc.) Im > > having a little trouble with a couple of things and was hoping for some > pointers. > > > > Firstly, I cant seem to parse any DDL statements (so trouble with > > `CREATE VIEW AS [SQL QUERY]`).I understand this is meant to be in the > > calcite-server module but cant really find anything). > > I decided to ignore the CREATE statements for the moment and just > > process the query to get the sources in it which was working on simple > > scripts but fail real quick when looking at things I've seen at > enterprises. > > > > I have a GitHub repo here<https://github.com/Spydernaz/sqlLineage>, > > some pointers would be awesome. > > > > Kind Regards, > > Nathaniel Vala > > > >
