[ https://issues.apache.org/jira/browse/FLINK-28187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557379#comment-17557379 ]
Aitozi commented on FLINK-28187: -------------------------------- Yes, I generate the JobId in advance to help duplicate the job submission [link|https://github.com/apache/flink-kubernetes-operator/blob/91753ec5cef1aef85ff3884197e75fa25f7f6625/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/FlinkService.java#L215] But, I think the problem is that if the job submitted failed, it will not store the reconcile spec, so the jobId is not stored. > Duplicate job submission for FlinkSessionJob > -------------------------------------------- > > Key: FLINK-28187 > URL: https://issues.apache.org/jira/browse/FLINK-28187 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator > Affects Versions: kubernetes-operator-1.0.0 > Reporter: Jeesmon Jacob > Priority: Critical > Attachments: flink-operator-log.txt > > > During a session job submission if a deployment error (ex: > concurrent.TimeoutException) is hit, operator will submit the job again. But > first submission could have succeeded in jobManager side and second > submission could result in duplicate job. Operator log attached. > Per [~gyfora]: > The problem is that in case a deployment error was hit, the > SessionJobObserver will not be able to tell whether it has submitted the job > or not. So it will simply try to submit it again. We have to find a mechanism > to correlate Jobs on the cluster with the SessionJob CR itself. Maybe we > could override the job name itself for this purpose or something like that. -- This message was sent by Atlassian Jira (v8.20.7#820007)