Thanks for the help Julian.

By lineage we mean given a query as an input to the API , we would like to
get *input table list* and *output table* as the output (Intermediate temp
tables are optional in lineage). Below are few examples in hive and in
vertica

*Hive Example :*
FROM page_view_stg pvs
INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
       SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url,
null, null, pvs.ip, pvs.cnt;

*Expected Output:*
Input table list : page_view_stg
output table list: page_view

*Vertica Example: *
CREATE TABLE TEST_LINEAGE AS
SELECT employee_first_name, employee_last_name, annual_salary,
employee_region
    FROM employee_dimension WHERE annual_salary IN
     (SELECT MAX(annual_salary) FROM employee_dimension GROUP BY
employee_region)
   UNSEGMENTED ALL NODES ;

*Expected Output:*
Input Table list: employee_dimension
output table: TEST_LINEAGE

So, we are trying to include it in Calcite and use it for our use case.


Thanks & Regards,
B Anil Kumar.

On Wed, Apr 11, 2018 at 3:34 PM, Julian Hyde <jh...@apache.org> wrote:

> Thanks for the example. I’ve heard several definitions of “query lineage”
> over the years, so it really helps clarify.
>
> I’m pretty sure we can do this based SqlNode (we would need to parse and
> validate; without validation, you are never sure whether “table1” is the
> name of a table or a view or an alias somewhere else in the query).
>
> Can you come up two or three more examples? From these we can write some
> unit tests, and from unit tests we can implement a feature.
>
> Julian
>
>
> > On Apr 11, 2018, at 2:48 PM, AnilKumar B <akumarb2...@gmail.com> wrote:
> >
> > Thanks for the quick response Enrico.
> >
> >>> Not sure if I understand correctly, but you are looking for something
> > that
> > given an SQL statement and metadata about existing tables it returns one
> > (or more) access plan ?
> > Yes, we are looking for table/DB object level plan
> >
> > Just for example, in the case below query,
> >
> > SqlNode node5 = getSqlParser("insert into table5 select a.c1, a.c2
> > from (select * from table1 join table2 on table1.c1 = table2.c1 )
> > tmp_table").parseStmt();
> >
> > The table level lineage will be [table1, table2]  -> [ tmp_table ] ->
> > [ table5 ]. And this we are trying to get from above sqlNode object.
> >
> >
> > May, I know, how to parse MySQL/Vertica/Hive queries? Is it through
> > Calcite JDBC?
> >
> >
> >
> >
> >
> >
> > Thanks & Regards,
> > B Anil Kumar.
> >
> > On Wed, Apr 11, 2018 at 1:45 PM, Enrico Olivelli <eolive...@gmail.com>
> > wrote:
> >
> >> Il mer 11 apr 2018, 22:34 AnilKumar B <akumarb2...@gmail.com> ha
> scritto:
> >>
> >>> Hi All,
> >>>
> >>> I am new to calcite. Need your help/inputs on usage of calcite to solve
> >> one
> >>> of our use case.
> >>>
> >>> We need to build a library to get the lineage graph of given query.
> Here
> >>> given query can be either Hive/ANSI SQL/Vertica queries etc.
> >>>
> >>> We can build this library by using their respective physical plans.
> But,
> >>> there are few issues as  mentioned below
> >>> 1. We need develop the library specific to each source
> >>> 2. In most of the sources, there is a limitation on explain query for
> >>> *create
> >>> table* queries.
> >>> 3. To get the explain plan, we need to have all the tables (which we
> are
> >>> used in query) should be in the in their respective meta stores. So, it
> >>> might cause the issues with tmp tables from previous queries.
> >>>
> >>> So, to avoid above issues, we are trying to develop a generic library
> >> which
> >>> can work for any source. For that, we are trying to use calcite
> parser's
> >>> SQLNode to derive the lineage for given query.
> >>>
> >>> We started with ANSI SQL queries and as of now, it's going good. But
> any
> >>> inputs from community
> >>> on below will be great help for us.
> >>> 1. Does calcite is correct fit for our use case?
> >>> 2. How to parse the JDBC queries and get the lineage?
> >>>
> >>
> >> Not sure if I understand correctly, but you are looking for something
> that
> >> given an SQL statement and metadata about existing tables it returns one
> >> (or more) access plan ?
> >> In this case Calcite is very a good choice. I will be happy to provide
> some
> >> example if this is your case.
> >>
> >> Enrico
> >>
> >>
> >>>
> >>> Thanks & Regards,
> >>> B Anil Kumar.
> >>>
> >> --
> >>
> >>
> >> -- Enrico Olivelli
> >>
>
>

Reply via email to