*Regarding RelMetadata*, you don't necessarily need a connection, but you
DO need to provide a schema somehow. It can be done programmatically as
well. Otherwise, for example, how would you expand the asterisk `SELECT *
FROM A.B` when you don't know what's in the table A.B.
I'm sure something can be done to auto-generate the schema by inferring
them from the queries, but no such thing exists yet, AFAIK.
So, if you want to do schema-less lineage generation, doing it on the parse
tree level (SqlNode) does seem to be the only option.

*Regarding CONVERT*, it looks like you're trying to parse MSSQL's CONVERT
function. You'll be happy to hear that it will be supported in 1.35.0. I
had the same issue as you with it, so I added a PR to extend the core
parser. See PR #3100 <https://github.com/apache/calcite/pull/3100>

Note on CONVERT and functions:
Normally, Calcite has a parser rule that generically parses any "generic"
function (aka in the form of F_NAME(arg1, arg2, ...)) and you can even
define your own functions to use as well. Try parsing *THISFUNCISFAKE(DATE,
5,0)* It should parse just fine.

The problem lies in CONVERT itself. There are parser rules specifically for
CONVERT preceding the "generic function" rule which parses *CONVERT(x USING
y)*. Since this rule precedes the "generic function" rule, parser expects
the *CONVERT(x USING y)* syntax for the CONVERT function, hence it is
unable to parse a more "generic" form of MSSQL CONVERT, requiring a parser
extension.

Hope that helps,
Askar



On 8 Jul 2023 Sat at 02:32 <[email protected]> wrote:

> Which parser are you using, the server (DDL) or Babel?
> I think DATEADD exists only in Babel.
> To solve this problem in our project we have combined both parsers into
> one,
> so now we can parse Babel + DDL.
> https://github.com/feldera/dbsp/pull/276
>
> Mihai
>
> -----Original Message-----
> From: Nathaniel Vala
> Sent: Thursday, July 06, 2023 9:13 PM
> To: [email protected]
> Subject: RE: Re: Calcite for Lineage
>
> Hi Askar and Mihai,
>
> Thanks for the help. I have modified the code to use a DDL parser, Minhai
> was right about the Babel one not having all the DDL statements in it. I
> have managed to parse this but still falling over at token that should
> parse.
>
> When parsing queries using `SqlDdlParserImpl.FACTORY` and `Lex.SQL_SERVER`
> I
> try to parse a simple DATEADD function and it's failing.
>
> ```sql
> SELECT  DATEADD(year, 1, '2017/08/25') AS DateAdd FROM unienrollements t1
> ```
>
> Java throws an `InvocationTargetException` near 'YEAR' and that it was
> expecting <some list here> However in that list, it has YEAR and YEAR (. To
> me, this should be able to parse, have I missed something in the config to
> enable this?
>
> If you need some code, this can be found here:
> https://github.com/Spydernaz/sqlLineage
>
>
> Furthermore, thanks for the suggestion regarding the RelMetadata but for
> the
> life of me cannot config it. Seems like I need to have a full connection
> established and fetch live schemas for it to validate against? This isn't
> really an option for me at the moment. I will upload the RelMetadata
> attempt
> in a separate branch of this repo
>
> On 2023/06/23 17:12:19 Askar Bozcan wrote:
> > Oh, true. It can parse CREATE TABLE however.
> > CREATE TABLE is pretty much the only DDL I needed to parse, so I
> > didn't really notice this.
> > Thanks for the correction!
> >
> > - Askar
> >
> >
> >
> >
> > On Fri, 23 Jun 2023 at 19:43, <[email protected]<mailto:[email protected]>>
> wrote:
> >
> > > From my experience the Babel parser does NOT include DDL - at least
> > > not in the released versions.
> > >
> > > Mihai
> > >
> > > -----Original Message-----
> > > From: Askar Bozcan
> > > Sent: Friday, June 23, 2023 1:14 AM
> > > To: [email protected]<mailto:[email protected]>
> > > Subject: Re: Calcite for Lineage
> > >
> > > Hey Nathaniel,
> > > To parse DDL statements, you need to use a different parser (see
> > > SqlParser.Config.withParserFactory
> > > <
> > > https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/p
> > > arser/SqlParser.Config.html#withParserFactory(org.apache.calcite.sql
> > > .parser.SqlParserImplFactory)
> > > >)
> > > because core Calcite parser
> > > <
> > > https://github.com/apache/calcite/blob/main/core/src/main/codegen/te
> > > mplates/Parser.jj
> > > >
> > > does
> > > not support DDL statements by design. You have two options:
> > > 1) Use SqlDdlParserImpl.FACTORY (from calcite-server package, as
> > > you've
> > > said)
> > > This is an extended parser that can also parse DDL queries.
> > >
> > > 2)  Use SqlBabelParserImpl.FACTORY (from calcite-babel package) This
> > > is also an extended parser that can not only parse DDL queries, but
> > > also many extra things not present in ISO SQL standard, such as
> > > Postgre's infix CAST operator *::* Since you're doing lineage
> > > generation I highly recommend using babel package for maximum
> compatibility with different DBs' queries.
> > >
> > > *Mini-explanation on how parsing works in Calcite* Try Go To'ing to
> > > SqlParserImpl code in your IDE. You're going to see a huge file,
> > > full of almost nonsensical if's.
> > > The reason is simple: Parser code is build-time generated Java code
> > > based on rules defined in core Calcite parser <
> > > https://github.com/apache/calcite/blob/main/core/src/main/codegen/te
> > > mplates/Parser.jj
> > > >.
> > > Parser.jj is a JavaCC file; JavaCC
> > > <https://javacc.github.io/javacc/><https://javacc.github.io/javacc/%
> > > 3e> is a parser generator that, based on rules you define, generates a
> pure Java code which can parse LL(k) grammars and generate the parse tree
> (SqlNode).
> > >
> > > Parser.jj, as seen on the repo, is not a pure JavaCC file however,
> > > but an Apache FreeMarker template. There are strings in the
> > > Parser.jj file that start with *${ .* Those are placeholders used by
> > > Apache FreeMarker, a templating engine.
> > > Extended parsers (babel, ddlparser) use those placeholders to insert
> > > their custom parsing rules without directly affecting the core
> > > parser file (it all still happens build-time, however).
> > > How Babel parser does it, for example:
> > > https://github.com/apache/calcite/blob/main/babel/src/main/codegen
> > >
> > > *An advice for lineage generation*
> > > If you're going to create a lineage generator, I highly recommend
> > > using a relational tree (RelNode tree) instead of parse tree
> > > (SqlNode) if you have access to DB tables.
> > > After all, a lineage shows the relation between tables/columns, and
> > > so does a relational tree. There is even a built-in method for lineage:
> > > getExpressionLineage
> > > <
> > > https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/m
> > > etadata/RelMetadataQuery.html#getExpressionLineage(org.apache.calcit
> > > e.rel.RelNode,org.apache.calcite.rex.RexNode)
> > > >
> > >
> > > Another advice, if you're planning to support different kinds of
> > > DBs, you will eventually run into something unparseable, and that
> > > will require the extension of the core Parser/babel Parser. Since
> > > it's build time, I suggest you submitting a PR to extend the parsers.
> > >
> > > Good luck,
> > > Askar
> > >
> > >
> > >
> > > On 23 Jun 2023 Fri at 07:37 Nathaniel Vala
> > > <[email protected]<mailto:[email protected]>>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I have been trying to build a java tool that would let people map
> > > > lineage by reading sql scripts (i.e. views or insert into etc.) Im
> > > > having a little trouble with a couple of things and was hoping for
> > > > some
> > > pointers.
> > > >
> > > > Firstly, I cant seem to parse any DDL statements (so trouble with
> > > > `CREATE VIEW AS [SQL QUERY]`).I understand this is meant to be in
> > > > the calcite-server module but cant really find anything).
> > > > I decided to ignore the CREATE statements for the moment and just
> > > > process the query to get the sources in it which was working on
> > > > simple scripts but fail real quick when looking at things I've
> > > > seen at
> > > enterprises.
> > > >
> > > > I have a GitHub repo
> > > > here<https://github.com/Spydernaz/sqlLineage><https://github.com/S
> > > > pydernaz/sqlLineage%3e>,
> > > > some pointers would be awesome.
> > > >
> > > > Kind Regards,
> > > > Nathaniel Vala
> > > >
> > >
> > >
> >
>
>

Reply via email to