[ 
https://issues.apache.org/jira/browse/FLINK-15208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bowen Li updated FLINK-15208:
-----------------------------
    Description: 
with dynamic catalog table in FLINK-15206, users can maintain a single SQL job 
for both their online and offline job. However, they still need to change their 
configurations in order to submit different jobs over time.

E.g. when users update logic of their streaming job, they need to bootstrap 
both a new online job and backfill offline job, let's call them sub-jobs of a 
job with dynamic catalog table. They would have to 
1) manually change execution mode in yaml config to "streaming", execute the 
sql and submit the streaming job 
2) manually change execution mode in yaml config to "batch", execute the sql 
and submit the batch job

we should introduce a mechanism to allow users submit all or a subset of 
sub-jobs all at once. In the backfill use case mentioned above, ideally users 
should just execute the SQL once, and Flink should spin up two jobs for our 
users. 

Streaming platforms at some big companies like Uber and Netflix are already 
kind of doing this for backfill use cases one way or another - some do it in 
UI, some do it in planning phase. Would be great to standardize this practice 
and provide users with ultimate simplicity.

The assumption here is that users are fully aware of the consequences of 
launching two/multiple jobs at the same time. E.g. they need to handle 
overlapped results if there's any.

  was:
with dynamic catalog table in FLINK-15206, users can maintain a single SQL job 
for both their online and offline job. However, they still need to change their 
configurations in order to submit different jobs over time.

E.g. when users update logic of their streaming job, they need to bootstrap 
both a new online job and backfill offline job, let's call them sub-jobs of a 
job with dynamic catalog table. They would have to 
1) manually change execution mode in yaml config to "streaming" and submit the 
streaming job 
2) manually change execution mode in yaml config to "batch" and submit the 
batch job

we should introduce a mechanism to allow users submit all or a subset of 
sub-jobs all at once. In the backfill use case mentioned above, ideally users 
should just execute the SQL once, and Flink should spin up two jobs for our 
users. 

Streaming platform at some big companies like Uber and Netflix are already kind 
of doing this for backfill use cases one way or another - some do it in UI, 
some do it in planning phase. Would be great to standardize this practice and 
provide users with ultimate simplicity.

The assumption here is that users are fully aware of the consequences of 
launching two/multiple jobs at the same time. E.g. they need to handle 
overlapped results if there's any, 


> client submits multiple sub-jobs for job with dynamic catalog table
> -------------------------------------------------------------------
>
>                 Key: FLINK-15208
>                 URL: https://issues.apache.org/jira/browse/FLINK-15208
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table SQL / API, Table SQL / Client
>            Reporter: Bowen Li
>            Assignee: Bowen Li
>            Priority: Major
>
> with dynamic catalog table in FLINK-15206, users can maintain a single SQL 
> job for both their online and offline job. However, they still need to change 
> their configurations in order to submit different jobs over time.
> E.g. when users update logic of their streaming job, they need to bootstrap 
> both a new online job and backfill offline job, let's call them sub-jobs of a 
> job with dynamic catalog table. They would have to 
> 1) manually change execution mode in yaml config to "streaming", execute the 
> sql and submit the streaming job 
> 2) manually change execution mode in yaml config to "batch", execute the sql 
> and submit the batch job
> we should introduce a mechanism to allow users submit all or a subset of 
> sub-jobs all at once. In the backfill use case mentioned above, ideally users 
> should just execute the SQL once, and Flink should spin up two jobs for our 
> users. 
> Streaming platforms at some big companies like Uber and Netflix are already 
> kind of doing this for backfill use cases one way or another - some do it in 
> UI, some do it in planning phase. Would be great to standardize this practice 
> and provide users with ultimate simplicity.
> The assumption here is that users are fully aware of the consequences of 
> launching two/multiple jobs at the same time. E.g. they need to handle 
> overlapped results if there's any.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to