[
https://issues.apache.org/jira/browse/FLINK-20417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Metzger updated FLINK-20417:
-----------------------------------
Fix Version/s: (was: 1.12.0)
> Handle "Too old resource version" exception in Kubernetes watch more
> gracefully
> -------------------------------------------------------------------------------
>
> Key: FLINK-20417
> URL: https://issues.apache.org/jira/browse/FLINK-20417
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / Kubernetes
> Affects Versions: 1.12.0, 1.11.2
> Reporter: Yang Wang
> Priority: Major
> Fix For: 1.11.3, 1.13.0
>
>
> Currently, when the watcher(pods watcher, configmap watcher) is closed with
> exception, we will call {{WatchCallbackHandler#handleFatalError}}. And this
> could cause JobManager terminating and then failover.
> For most cases, this is correct. But not for "too old resource version"
> exception. See more information here[1]. Usually this exception could happen
> when the APIServer is restarted. And we just need to create a new watch and
> continue to do the pods/configmap watching. This could help the Flink cluster
> reducing the impact of K8s cluster restarting.
>
> The issue is inspired by this technical article[2]. Thanks the guys from
> tencent for the debugging. Note this is a Chinese documentation.
>
> [1].
> [https://stackoverflow.com/questions/61409596/kubernetes-too-old-resource-version]
> [2]. [https://cloud.tencent.com/developer/article/1731416]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)