Thanks for the example. I’ve heard several definitions of “query lineage” over 
the years, so it really helps clarify.

I’m pretty sure we can do this based SqlNode (we would need to parse and 
validate; without validation, you are never sure whether “table1” is the name 
of a table or a view or an alias somewhere else in the query).

Can you come up two or three more examples? From these we can write some unit 
tests, and from unit tests we can implement a feature.

Julian


> On Apr 11, 2018, at 2:48 PM, AnilKumar B <[email protected]> wrote:
> 
> Thanks for the quick response Enrico.
> 
>>> Not sure if I understand correctly, but you are looking for something
> that
> given an SQL statement and metadata about existing tables it returns one
> (or more) access plan ?
> Yes, we are looking for table/DB object level plan
> 
> Just for example, in the case below query,
> 
> SqlNode node5 = getSqlParser("insert into table5 select a.c1, a.c2
> from (select * from table1 join table2 on table1.c1 = table2.c1 )
> tmp_table").parseStmt();
> 
> The table level lineage will be [table1, table2]  -> [ tmp_table ] ->
> [ table5 ]. And this we are trying to get from above sqlNode object.
> 
> 
> May, I know, how to parse MySQL/Vertica/Hive queries? Is it through
> Calcite JDBC?
> 
> 
> 
> 
> 
> 
> Thanks & Regards,
> B Anil Kumar.
> 
> On Wed, Apr 11, 2018 at 1:45 PM, Enrico Olivelli <[email protected]>
> wrote:
> 
>> Il mer 11 apr 2018, 22:34 AnilKumar B <[email protected]> ha scritto:
>> 
>>> Hi All,
>>> 
>>> I am new to calcite. Need your help/inputs on usage of calcite to solve
>> one
>>> of our use case.
>>> 
>>> We need to build a library to get the lineage graph of given query. Here
>>> given query can be either Hive/ANSI SQL/Vertica queries etc.
>>> 
>>> We can build this library by using their respective physical plans. But,
>>> there are few issues as  mentioned below
>>> 1. We need develop the library specific to each source
>>> 2. In most of the sources, there is a limitation on explain query for
>>> *create
>>> table* queries.
>>> 3. To get the explain plan, we need to have all the tables (which we are
>>> used in query) should be in the in their respective meta stores. So, it
>>> might cause the issues with tmp tables from previous queries.
>>> 
>>> So, to avoid above issues, we are trying to develop a generic library
>> which
>>> can work for any source. For that, we are trying to use calcite parser's
>>> SQLNode to derive the lineage for given query.
>>> 
>>> We started with ANSI SQL queries and as of now, it's going good. But any
>>> inputs from community
>>> on below will be great help for us.
>>> 1. Does calcite is correct fit for our use case?
>>> 2. How to parse the JDBC queries and get the lineage?
>>> 
>> 
>> Not sure if I understand correctly, but you are looking for something that
>> given an SQL statement and metadata about existing tables it returns one
>> (or more) access plan ?
>> In this case Calcite is very a good choice. I will be happy to provide some
>> example if this is your case.
>> 
>> Enrico
>> 
>> 
>>> 
>>> Thanks & Regards,
>>> B Anil Kumar.
>>> 
>> --
>> 
>> 
>> -- Enrico Olivelli
>> 

Reply via email to