[jira] [Updated] (FLINK-20417) Handle "Too old resource version" exception in Kubernetes watch more gracefully

Robert Metzger (Jira) Mon, 07 Dec 2020 22:40:17 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-20417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Metzger updated FLINK-20417:
-----------------------------------
    Fix Version/s:     (was: 1.12.0)

> Handle "Too old resource version" exception in Kubernetes watch more 
> gracefully
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-20417
>                 URL: https://issues.apache.org/jira/browse/FLINK-20417
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.12.0, 1.11.2
>            Reporter: Yang Wang
>            Priority: Major
>             Fix For: 1.11.3, 1.13.0
>
>
> Currently, when the watcher(pods watcher, configmap watcher) is closed with 
> exception, we will call {{WatchCallbackHandler#handleFatalError}}. And this 
> could cause JobManager terminating and then failover.
> For most cases, this is correct. But not for "too old resource version" 
> exception. See more information here[1]. Usually this exception could happen 
> when the APIServer is restarted. And we just need to create a new watch and 
> continue to do the pods/configmap watching. This could help the Flink cluster 
> reducing the impact of K8s cluster restarting.
>  
> The issue is inspired by this technical article[2]. Thanks the guys from 
> tencent for the debugging. Note this is a Chinese documentation.
>  
> [1]. 
> [https://stackoverflow.com/questions/61409596/kubernetes-too-old-resource-version]
> [2]. [https://cloud.tencent.com/developer/article/1731416]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-20417) Handle "Too old resource version" exception in Kubernetes watch more gracefully

Reply via email to