Hi Julian, May I know, how to parse MySQL/Vertica/Hive queries through Calcite? Is it through Calcite JDBC?
Thanks & Regards, B Anil Kumar. On Wed, Apr 11, 2018 at 6:56 PM, AnilKumar B <[email protected]> wrote: > Thanks for the help Julian. > > By lineage we mean given a query as an input to the API , we would like to > get *input table list* and *output table* as the output (Intermediate > temp tables are optional in lineage). Below are few examples in hive and in > vertica > > *Hive Example :* > FROM page_view_stg pvs > INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country) > SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, > null, null, pvs.ip, pvs.cnt; > > *Expected Output:* > Input table list : page_view_stg > output table list: page_view > > *Vertica Example: * > CREATE TABLE TEST_LINEAGE AS > SELECT employee_first_name, employee_last_name, annual_salary, > employee_region > FROM employee_dimension WHERE annual_salary IN > (SELECT MAX(annual_salary) FROM employee_dimension GROUP BY > employee_region) > UNSEGMENTED ALL NODES ; > > *Expected Output:* > Input Table list: employee_dimension > output table: TEST_LINEAGE > > So, we are trying to include it in Calcite and use it for our use case. > > > Thanks & Regards, > B Anil Kumar. > > On Wed, Apr 11, 2018 at 3:34 PM, Julian Hyde <[email protected]> wrote: > >> Thanks for the example. I’ve heard several definitions of “query lineage” >> over the years, so it really helps clarify. >> >> I’m pretty sure we can do this based SqlNode (we would need to parse and >> validate; without validation, you are never sure whether “table1” is the >> name of a table or a view or an alias somewhere else in the query). >> >> Can you come up two or three more examples? From these we can write some >> unit tests, and from unit tests we can implement a feature. >> >> Julian >> >> >> > On Apr 11, 2018, at 2:48 PM, AnilKumar B <[email protected]> wrote: >> > >> > Thanks for the quick response Enrico. >> > >> >>> Not sure if I understand correctly, but you are looking for something >> > that >> > given an SQL statement and metadata about existing tables it returns one >> > (or more) access plan ? >> > Yes, we are looking for table/DB object level plan >> > >> > Just for example, in the case below query, >> > >> > SqlNode node5 = getSqlParser("insert into table5 select a.c1, a.c2 >> > from (select * from table1 join table2 on table1.c1 = table2.c1 ) >> > tmp_table").parseStmt(); >> > >> > The table level lineage will be [table1, table2] -> [ tmp_table ] -> >> > [ table5 ]. And this we are trying to get from above sqlNode object. >> > >> > >> > May, I know, how to parse MySQL/Vertica/Hive queries? Is it through >> > Calcite JDBC? >> > >> > >> > >> > >> > >> > >> > Thanks & Regards, >> > B Anil Kumar. >> > >> > On Wed, Apr 11, 2018 at 1:45 PM, Enrico Olivelli <[email protected]> >> > wrote: >> > >> >> Il mer 11 apr 2018, 22:34 AnilKumar B <[email protected]> ha >> scritto: >> >> >> >>> Hi All, >> >>> >> >>> I am new to calcite. Need your help/inputs on usage of calcite to >> solve >> >> one >> >>> of our use case. >> >>> >> >>> We need to build a library to get the lineage graph of given query. >> Here >> >>> given query can be either Hive/ANSI SQL/Vertica queries etc. >> >>> >> >>> We can build this library by using their respective physical plans. >> But, >> >>> there are few issues as mentioned below >> >>> 1. We need develop the library specific to each source >> >>> 2. In most of the sources, there is a limitation on explain query for >> >>> *create >> >>> table* queries. >> >>> 3. To get the explain plan, we need to have all the tables (which we >> are >> >>> used in query) should be in the in their respective meta stores. So, >> it >> >>> might cause the issues with tmp tables from previous queries. >> >>> >> >>> So, to avoid above issues, we are trying to develop a generic library >> >> which >> >>> can work for any source. For that, we are trying to use calcite >> parser's >> >>> SQLNode to derive the lineage for given query. >> >>> >> >>> We started with ANSI SQL queries and as of now, it's going good. But >> any >> >>> inputs from community >> >>> on below will be great help for us. >> >>> 1. Does calcite is correct fit for our use case? >> >>> 2. How to parse the JDBC queries and get the lineage? >> >>> >> >> >> >> Not sure if I understand correctly, but you are looking for something >> that >> >> given an SQL statement and metadata about existing tables it returns >> one >> >> (or more) access plan ? >> >> In this case Calcite is very a good choice. I will be happy to provide >> some >> >> example if this is your case. >> >> >> >> Enrico >> >> >> >> >> >>> >> >>> Thanks & Regards, >> >>> B Anil Kumar. >> >>> >> >> -- >> >> >> >> >> >> -- Enrico Olivelli >> >> >> >> >
