[jira] [Updated] (FLINK-15206) support dynamic catalog table for truly unified SQL job

Bowen Li (Jira) Wed, 11 Dec 2019 19:59:09 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Bowen Li updated FLINK-15206:
-----------------------------
    Description: 
currently if users have both an online and an offline job with same business 
logic in Flink SQL, their codebase is still not unified. They would keep two 
SQL statements whose only difference is the source (or/and sink) table. E.g.


{code:java}
// online job
insert into x select * from kafka_table;

// offline backfill job
insert into x select * from hive_table;
{code}

We would like to introduce a "dynamic catalog table". The dynamic catalog table 
acts as a view, and is just an abstract table from actual tables behind it 
under some configurations. When execute a job, depending on the configuration, 
the dynamic catalog table can point to an actual source table.

A use case for this is the example given above - users want to just keep one 
sql statement as {{insert into x select * from my_source_dynamic_table);}}; 
when executed in streaming mode, {{my_source_dynamic_table}} should point to a 
kafka catalog table, and in batch mode, {{my_source_dynamic_table}} should 
point to a hive catalog table.

One thing to note is that the starting position of kafka_table, and 
starting/ending position of hive_table are different every time. needs more 
thinking of how can we accommodate that

  was:
currently if users have both an online and an offline job with same business 
logic in Flink SQL, their codebase is still not unified. They would keep two 
SQL statements whose only difference is the source (or/and sink) table. E.g.


{code:java}
// online job
insert into x select * from kafka_table;

// offline backfill job
insert into x select * from hive_table;
{code}

We would like to introduce a "dynamic catalog table". The dynamic catalog table 
acts as a view, and is just an abstract table from actual tables behind it 
under some configurations. When execute a job, depending on the configuration, 
the dynamic catalog table can point to an actual source table.

A use case for this is the example given above - users want to just keep one 
sql statement as {{insert into x select * from my_source_dynamic_table);}}; 
when executed in streaming mode, {{my_source_dynamic_table}} should point to a 
kafka catalog table, and in batch mode, {{my_source_dynamic_table}} should 
point to a hive catalog table.



> support dynamic catalog table for truly unified SQL job
> -------------------------------------------------------
>
>                 Key: FLINK-15206
>                 URL: https://issues.apache.org/jira/browse/FLINK-15206
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table SQL / API
>            Reporter: Bowen Li
>            Assignee: Bowen Li
>            Priority: Major
>             Fix For: 1.11.0
>
>
> currently if users have both an online and an offline job with same business 
> logic in Flink SQL, their codebase is still not unified. They would keep two 
> SQL statements whose only difference is the source (or/and sink) table. E.g.
> {code:java}
> // online job
> insert into x select * from kafka_table;
> // offline backfill job
> insert into x select * from hive_table;
> {code}
> We would like to introduce a "dynamic catalog table". The dynamic catalog 
> table acts as a view, and is just an abstract table from actual tables behind 
> it under some configurations. When execute a job, depending on the 
> configuration, the dynamic catalog table can point to an actual source table.
> A use case for this is the example given above - users want to just keep one 
> sql statement as {{insert into x select * from my_source_dynamic_table);}}; 
> when executed in streaming mode, {{my_source_dynamic_table}} should point to 
> a kafka catalog table, and in batch mode, {{my_source_dynamic_table}} should 
> point to a hive catalog table.
> One thing to note is that the starting position of kafka_table, and 
> starting/ending position of hive_table are different every time. needs more 
> thinking of how can we accommodate that



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-15206) support dynamic catalog table for truly unified SQL job

Reply via email to