[
https://issues.apache.org/jira/browse/FLINK-26916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514248#comment-17514248
]
Gyula Fora commented on FLINK-26916:
------------------------------------
After some discussion with [~thw] it seems there is an inherent limitation in
the Kubernetes HA service that makes this nearly impossible to implement.
The Kuberntes HA service only stores a file pointer within the kubernetes HA
storage dir. In order to get the actual checkpoint pointer one would actually
need to read the CompletedCheckpoint object from the HA storage dir and get it
from that.
This would require access to the HA storage from within the operator which is
completely unfeasible.
We suggest to change the Flink Kubernetes HA Service implementation to store
the external checkpoint pointer also in the same configmap. This could be a
minimal backward compatible change that we should aim to get in for 1.15 and if
simple enough backport for the next 1.14 release.
Due to these inherent limitations we propose to add a big fat warning to the
last-state upgrade mode and point out that job changes are not possible and
accept this as a limitation for the preview release.
[~wangyang0918] : Yang you are familiar with the Kubernetes HA
implementation, do you think we can reasonably make this change for 1.15? What
is your gut feeling?
> The Operator ignores job related changes (jar, parallelism) during last-state
> upgrades
> --------------------------------------------------------------------------------------
>
> Key: FLINK-26916
> URL: https://issues.apache.org/jira/browse/FLINK-26916
> Project: Flink
> Issue Type: Bug
> Components: Kubernetes Operator
> Affects Versions: kubernetes-operator-0.1.0, kubernetes-operator-1.0.0
> Reporter: Matyas Orhidi
> Assignee: Gyula Fora
> Priority: Critical
>
> RC: The old jobgraph is being reused when resuming
--
This message was sent by Atlassian Jira
(v8.20.1#820001)