Hi Julian,

May I know, how to parse MySQL/Vertica/Hive queries through Calcite? Is it
through Calcite JDBC?

Thanks & Regards,
B Anil Kumar.

On Wed, Apr 11, 2018 at 6:56 PM, AnilKumar B <akumarb2...@gmail.com> wrote:

> Thanks for the help Julian.
>
> By lineage we mean given a query as an input to the API , we would like to
> get *input table list* and *output table* as the output (Intermediate
> temp tables are optional in lineage). Below are few examples in hive and in
> vertica
>
> *Hive Example :*
> FROM page_view_stg pvs
> INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
>        SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url,
> null, null, pvs.ip, pvs.cnt;
>
> *Expected Output:*
> Input table list : page_view_stg
> output table list: page_view
>
> *Vertica Example: *
> CREATE TABLE TEST_LINEAGE AS
> SELECT employee_first_name, employee_last_name, annual_salary,
> employee_region
>     FROM employee_dimension WHERE annual_salary IN
>      (SELECT MAX(annual_salary) FROM employee_dimension GROUP BY
> employee_region)
>    UNSEGMENTED ALL NODES ;
>
> *Expected Output:*
> Input Table list: employee_dimension
> output table: TEST_LINEAGE
>
> So, we are trying to include it in Calcite and use it for our use case.
>
>
> Thanks & Regards,
> B Anil Kumar.
>
> On Wed, Apr 11, 2018 at 3:34 PM, Julian Hyde <jh...@apache.org> wrote:
>
>> Thanks for the example. I’ve heard several definitions of “query lineage”
>> over the years, so it really helps clarify.
>>
>> I’m pretty sure we can do this based SqlNode (we would need to parse and
>> validate; without validation, you are never sure whether “table1” is the
>> name of a table or a view or an alias somewhere else in the query).
>>
>> Can you come up two or three more examples? From these we can write some
>> unit tests, and from unit tests we can implement a feature.
>>
>> Julian
>>
>>
>> > On Apr 11, 2018, at 2:48 PM, AnilKumar B <akumarb2...@gmail.com> wrote:
>> >
>> > Thanks for the quick response Enrico.
>> >
>> >>> Not sure if I understand correctly, but you are looking for something
>> > that
>> > given an SQL statement and metadata about existing tables it returns one
>> > (or more) access plan ?
>> > Yes, we are looking for table/DB object level plan
>> >
>> > Just for example, in the case below query,
>> >
>> > SqlNode node5 = getSqlParser("insert into table5 select a.c1, a.c2
>> > from (select * from table1 join table2 on table1.c1 = table2.c1 )
>> > tmp_table").parseStmt();
>> >
>> > The table level lineage will be [table1, table2]  -> [ tmp_table ] ->
>> > [ table5 ]. And this we are trying to get from above sqlNode object.
>> >
>> >
>> > May, I know, how to parse MySQL/Vertica/Hive queries? Is it through
>> > Calcite JDBC?
>> >
>> >
>> >
>> >
>> >
>> >
>> > Thanks & Regards,
>> > B Anil Kumar.
>> >
>> > On Wed, Apr 11, 2018 at 1:45 PM, Enrico Olivelli <eolive...@gmail.com>
>> > wrote:
>> >
>> >> Il mer 11 apr 2018, 22:34 AnilKumar B <akumarb2...@gmail.com> ha
>> scritto:
>> >>
>> >>> Hi All,
>> >>>
>> >>> I am new to calcite. Need your help/inputs on usage of calcite to
>> solve
>> >> one
>> >>> of our use case.
>> >>>
>> >>> We need to build a library to get the lineage graph of given query.
>> Here
>> >>> given query can be either Hive/ANSI SQL/Vertica queries etc.
>> >>>
>> >>> We can build this library by using their respective physical plans.
>> But,
>> >>> there are few issues as  mentioned below
>> >>> 1. We need develop the library specific to each source
>> >>> 2. In most of the sources, there is a limitation on explain query for
>> >>> *create
>> >>> table* queries.
>> >>> 3. To get the explain plan, we need to have all the tables (which we
>> are
>> >>> used in query) should be in the in their respective meta stores. So,
>> it
>> >>> might cause the issues with tmp tables from previous queries.
>> >>>
>> >>> So, to avoid above issues, we are trying to develop a generic library
>> >> which
>> >>> can work for any source. For that, we are trying to use calcite
>> parser's
>> >>> SQLNode to derive the lineage for given query.
>> >>>
>> >>> We started with ANSI SQL queries and as of now, it's going good. But
>> any
>> >>> inputs from community
>> >>> on below will be great help for us.
>> >>> 1. Does calcite is correct fit for our use case?
>> >>> 2. How to parse the JDBC queries and get the lineage?
>> >>>
>> >>
>> >> Not sure if I understand correctly, but you are looking for something
>> that
>> >> given an SQL statement and metadata about existing tables it returns
>> one
>> >> (or more) access plan ?
>> >> In this case Calcite is very a good choice. I will be happy to provide
>> some
>> >> example if this is your case.
>> >>
>> >> Enrico
>> >>
>> >>
>> >>>
>> >>> Thanks & Regards,
>> >>> B Anil Kumar.
>> >>>
>> >> --
>> >>
>> >>
>> >> -- Enrico Olivelli
>> >>
>>
>>
>

Reply via email to