Thanks for the help Julian. By lineage we mean given a query as an input to the API , we would like to get *input table list* and *output table* as the output (Intermediate temp tables are optional in lineage). Below are few examples in hive and in vertica
*Hive Example :* FROM page_view_stg pvs INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country) SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, pvs.cnt; *Expected Output:* Input table list : page_view_stg output table list: page_view *Vertica Example: * CREATE TABLE TEST_LINEAGE AS SELECT employee_first_name, employee_last_name, annual_salary, employee_region FROM employee_dimension WHERE annual_salary IN (SELECT MAX(annual_salary) FROM employee_dimension GROUP BY employee_region) UNSEGMENTED ALL NODES ; *Expected Output:* Input Table list: employee_dimension output table: TEST_LINEAGE So, we are trying to include it in Calcite and use it for our use case. Thanks & Regards, B Anil Kumar. On Wed, Apr 11, 2018 at 3:34 PM, Julian Hyde <jh...@apache.org> wrote: > Thanks for the example. I’ve heard several definitions of “query lineage” > over the years, so it really helps clarify. > > I’m pretty sure we can do this based SqlNode (we would need to parse and > validate; without validation, you are never sure whether “table1” is the > name of a table or a view or an alias somewhere else in the query). > > Can you come up two or three more examples? From these we can write some > unit tests, and from unit tests we can implement a feature. > > Julian > > > > On Apr 11, 2018, at 2:48 PM, AnilKumar B <akumarb2...@gmail.com> wrote: > > > > Thanks for the quick response Enrico. > > > >>> Not sure if I understand correctly, but you are looking for something > > that > > given an SQL statement and metadata about existing tables it returns one > > (or more) access plan ? > > Yes, we are looking for table/DB object level plan > > > > Just for example, in the case below query, > > > > SqlNode node5 = getSqlParser("insert into table5 select a.c1, a.c2 > > from (select * from table1 join table2 on table1.c1 = table2.c1 ) > > tmp_table").parseStmt(); > > > > The table level lineage will be [table1, table2] -> [ tmp_table ] -> > > [ table5 ]. And this we are trying to get from above sqlNode object. > > > > > > May, I know, how to parse MySQL/Vertica/Hive queries? Is it through > > Calcite JDBC? > > > > > > > > > > > > > > Thanks & Regards, > > B Anil Kumar. > > > > On Wed, Apr 11, 2018 at 1:45 PM, Enrico Olivelli <eolive...@gmail.com> > > wrote: > > > >> Il mer 11 apr 2018, 22:34 AnilKumar B <akumarb2...@gmail.com> ha > scritto: > >> > >>> Hi All, > >>> > >>> I am new to calcite. Need your help/inputs on usage of calcite to solve > >> one > >>> of our use case. > >>> > >>> We need to build a library to get the lineage graph of given query. > Here > >>> given query can be either Hive/ANSI SQL/Vertica queries etc. > >>> > >>> We can build this library by using their respective physical plans. > But, > >>> there are few issues as mentioned below > >>> 1. We need develop the library specific to each source > >>> 2. In most of the sources, there is a limitation on explain query for > >>> *create > >>> table* queries. > >>> 3. To get the explain plan, we need to have all the tables (which we > are > >>> used in query) should be in the in their respective meta stores. So, it > >>> might cause the issues with tmp tables from previous queries. > >>> > >>> So, to avoid above issues, we are trying to develop a generic library > >> which > >>> can work for any source. For that, we are trying to use calcite > parser's > >>> SQLNode to derive the lineage for given query. > >>> > >>> We started with ANSI SQL queries and as of now, it's going good. But > any > >>> inputs from community > >>> on below will be great help for us. > >>> 1. Does calcite is correct fit for our use case? > >>> 2. How to parse the JDBC queries and get the lineage? > >>> > >> > >> Not sure if I understand correctly, but you are looking for something > that > >> given an SQL statement and metadata about existing tables it returns one > >> (or more) access plan ? > >> In this case Calcite is very a good choice. I will be happy to provide > some > >> example if this is your case. > >> > >> Enrico > >> > >> > >>> > >>> Thanks & Regards, > >>> B Anil Kumar. > >>> > >> -- > >> > >> > >> -- Enrico Olivelli > >> > >