[jira] [Commented] (FLINK-28187) Duplicate job submission for FlinkSessionJob

Aitozi (Jira) Wed, 22 Jun 2022 05:08:04 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-28187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557404#comment-17557404
 ]


Aitozi commented on FLINK-28187:
--------------------------------

I get your meaning now, but I think the case is a bit different from the flink 
deployment. In the FlinkDeployment, we could get the deployment first then 
compare the generation. 
In the session job mode, the jobID is the unique key, if the spec changed will 
generate a different jobID and it will make the old job orphaned. The spec 
change can happen during the job submission failure and the job observed I 
think (although it is small probability). 

Can we generate the JobID by the uid of the resource. In this way, one CR will 
have the same JobID through its lifetime. By this, we can always get the job by 
the same JobID 

> Duplicate job submission for FlinkSessionJob
> --------------------------------------------
>
>                 Key: FLINK-28187
>                 URL: https://issues.apache.org/jira/browse/FLINK-28187
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.0.0
>            Reporter: Jeesmon Jacob
>            Priority: Critical
>         Attachments: flink-operator-log.txt
>
>
> During a session job submission if a deployment error (ex: 
> concurrent.TimeoutException) is hit, operator will submit the job again. But 
> first submission could have succeeded in jobManager side and second 
> submission could result in duplicate job. Operator log attached.
> Per [~gyfora]:
> The problem is that in case a deployment error was hit, the 
> SessionJobObserver will not be able to tell whether it has submitted the job 
> or not. So it will simply try to submit it again. We have to find a mechanism 
> to correlate Jobs on the cluster with the SessionJob CR itself. Maybe we 
> could override the job name itself for this purpose or something like that.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-28187) Duplicate job submission for FlinkSessionJob

Reply via email to