[
https://issues.apache.org/jira/browse/FLINK-28187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557404#comment-17557404
]
Aitozi commented on FLINK-28187:
--------------------------------
I get your meaning now, but I think the case is a bit different from the flink
deployment. In the FlinkDeployment, we could get the deployment first then
compare the generation.
In the session job mode, the jobID is the unique key, if the spec changed will
generate a different jobID and it will make the old job orphaned. The spec
change can happen during the job submission failure and the job observed I
think (although it is small probability).
Can we generate the JobID by the uid of the resource. In this way, one CR will
have the same JobID through its lifetime. By this, we can always get the job by
the same JobID
> Duplicate job submission for FlinkSessionJob
> --------------------------------------------
>
> Key: FLINK-28187
> URL: https://issues.apache.org/jira/browse/FLINK-28187
> Project: Flink
> Issue Type: Bug
> Components: Kubernetes Operator
> Affects Versions: kubernetes-operator-1.0.0
> Reporter: Jeesmon Jacob
> Priority: Critical
> Attachments: flink-operator-log.txt
>
>
> During a session job submission if a deployment error (ex:
> concurrent.TimeoutException) is hit, operator will submit the job again. But
> first submission could have succeeded in jobManager side and second
> submission could result in duplicate job. Operator log attached.
> Per [~gyfora]:
> The problem is that in case a deployment error was hit, the
> SessionJobObserver will not be able to tell whether it has submitted the job
> or not. So it will simply try to submit it again. We have to find a mechanism
> to correlate Jobs on the cluster with the SessionJob CR itself. Maybe we
> could override the job name itself for this purpose or something like that.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)