[jira] [Updated] (FLINK-15206) support dynamic catalog table for truly unified SQL job

Bowen Li (Jira) Fri, 20 Dec 2019 15:35:07 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Bowen Li updated FLINK-15206:
-----------------------------
    Description: 
currently if users have both an online and an offline job with same business 
logic in Flink SQL, their codebase is still not unified. They would keep two 
SQL statements whose only difference is the source (or/and sink) table (with 
different params). E.g.


{code:java}
// online job
insert into x select * from kafka_table (starting time) ...;

// offline backfill job
insert into x select * from hive_table  (starting and ending time) ...;
{code}

We would like to introduce a "dynamic catalog table". The dynamic catalog table 
acts as a view, and is just an abstract table of multiple actual tables behind 
it that can be switched under some configuration flags. When execute a job, 
depending on the configuration, the dynamic catalog table can point to an 
actual source table.

A use case for this is the example given above - when executed in streaming 
mode, {{my_source_dynamic_table}} should point to a kafka catalog table with a 
new starting position, and in batch mode, {{my_source_dynamic_table}} should 
point to a hive catalog table with starting/ending positions.
 
One thing to note is that the starting position of kafka_table, and 
starting/ending position of hive_table are different every time. needs more 
thinking of how can we accommodate that

  was:
currently if users have both an online and an offline job with same business 
logic in Flink SQL, their codebase is still not unified. They would keep two 
SQL statements whose only difference is the source (or/and sink) table (with 
different params). E.g.


{code:java}
// online job
insert into x select * from kafka_table (starting time) ...;

// offline backfill job
insert into x select * from hive_table  (starting and ending time) ...;
{code}

We would like to introduce a "dynamic catalog table". The dynamic catalog table 
acts as a view, and is just an abstract table from actual tables behind it 
under some configurations. When execute a job, depending on the configuration, 
the dynamic catalog table can point to an actual source table.

A use case for this is the example given above - when executed in streaming 
mode, {{my_source_dynamic_table}} should point to a kafka catalog table with a 
new starting position, and in batch mode, {{my_source_dynamic_table}} should 
point to a hive catalog table with starting/ending positions.
 
One thing to note is that the starting position of kafka_table, and 
starting/ending position of hive_table are different every time. needs more 
thinking of how can we accommodate that


> support dynamic catalog table for truly unified SQL job
> -------------------------------------------------------
>
>                 Key: FLINK-15206
>                 URL: https://issues.apache.org/jira/browse/FLINK-15206
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table SQL / API
>            Reporter: Bowen Li
>            Assignee: Bowen Li
>            Priority: Major
>             Fix For: 1.11.0
>
>
> currently if users have both an online and an offline job with same business 
> logic in Flink SQL, their codebase is still not unified. They would keep two 
> SQL statements whose only difference is the source (or/and sink) table (with 
> different params). E.g.
> {code:java}
> // online job
> insert into x select * from kafka_table (starting time) ...;
> // offline backfill job
> insert into x select * from hive_table  (starting and ending time) ...;
> {code}
> We would like to introduce a "dynamic catalog table". The dynamic catalog 
> table acts as a view, and is just an abstract table of multiple actual tables 
> behind it that can be switched under some configuration flags. When execute a 
> job, depending on the configuration, the dynamic catalog table can point to 
> an actual source table.
> A use case for this is the example given above - when executed in streaming 
> mode, {{my_source_dynamic_table}} should point to a kafka catalog table with 
> a new starting position, and in batch mode, {{my_source_dynamic_table}} 
> should point to a hive catalog table with starting/ending positions.
>  
> One thing to note is that the starting position of kafka_table, and 
> starting/ending position of hive_table are different every time. needs more 
> thinking of how can we accommodate that



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-15206) support dynamic catalog table for truly unified SQL job

Reply via email to