[jira] [Commented] (FLINK-28187) Duplicate job submission for FlinkSessionJob

Gyula Fora (Jira) Wed, 22 Jun 2022 22:24:26 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-28187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557814#comment-17557814
 ]


Gyula Fora commented on FLINK-28187:
------------------------------------

Thanks for the comment. Identifying failed first deployments is slightly tricky 
I agree but this doesn't really affect the general requirement:

 1.  Have a way to detect in the FlinkService if a job for this resource is 
already running (throw an error) -> never allow double submission
 2.  Have a way to detect in the Observer if an upgrade already happened and 
update the lastReconciledSpec accordingly

For Deployments 1) is provided by Flink itself, 2) is basically covered in the 
commit I sent. For sessionjobs we need to cover both using the jobid magic 
somehow :) 

> Duplicate job submission for FlinkSessionJob
> --------------------------------------------
>
>                 Key: FLINK-28187
>                 URL: https://issues.apache.org/jira/browse/FLINK-28187
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.0.0
>            Reporter: Jeesmon Jacob
>            Priority: Critical
>         Attachments: flink-operator-log.txt
>
>
> During a session job submission if a deployment error (ex: 
> concurrent.TimeoutException) is hit, operator will submit the job again. But 
> first submission could have succeeded in jobManager side and second 
> submission could result in duplicate job. Operator log attached.
> Per [~gyfora]:
> The problem is that in case a deployment error was hit, the 
> SessionJobObserver will not be able to tell whether it has submitted the job 
> or not. So it will simply try to submit it again. We have to find a mechanism 
> to correlate Jobs on the cluster with the SessionJob CR itself. Maybe we 
> could override the job name itself for this purpose or something like that.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-28187) Duplicate job submission for FlinkSessionJob

Reply via email to