[
https://issues.apache.org/jira/browse/FLINK-29109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17637616#comment-17637616
]
Gyula Fora commented on FLINK-29109:
------------------------------------
[~thw] , looking at this again, it seems that we might need this logic for all
Flink versions (not just before 1.16) in 1.16 you get a generated jobId based
on the clusterid but in our case thats also fixed, so it will lead to the same
issues.
What do you think?
> Checkpoint path conflict with stateless upgrade mode
> ----------------------------------------------------
>
> Key: FLINK-29109
> URL: https://issues.apache.org/jira/browse/FLINK-29109
> Project: Flink
> Issue Type: Bug
> Components: Kubernetes Operator
> Affects Versions: kubernetes-operator-1.1.0
> Reporter: Thomas Weise
> Assignee: Thomas Weise
> Priority: Major
> Labels: pull-request-available
> Fix For: kubernetes-operator-1.2.0
>
>
> A stateful job with stateless upgrade mode (yes, there are such use cases)
> fails with checkpoint path conflict due to constant jobId and FLINK-19358
> (applies to Flink < 1.16x). Since with stateless upgrade mode the checkpoint
> id resets on restart the job is going to write to previously used locations
> and fail. The workaround is to rotate the jobId on every redeploy when the
> upgrade mode is stateless. While this can be worked around externally it is
> best done in the operator itself because reconciliation resolves when a
> restart is actually required while rotating jobId externally may trigger
> unnecessary restarts.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)