[ 
https://issues.apache.org/jira/browse/FLINK-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150863#comment-17150863
 ] 

Robert Metzger commented on FLINK-17177:
----------------------------------------

Looking at the code, it seems that we are only logging (any) event on DEBUG 
level.

Maybe as an intermediate step, we could log on "WARN" that we've received an 
error from K8s?
Otherwise, we might have error reports from users which will be hard to debug.
Also, this might help us understand in the long run, which types of errors K8s 
is reporting here.

> Handle ERROR event correctly in KubernetesResourceManager#onError
> -----------------------------------------------------------------
>
>                 Key: FLINK-17177
>                 URL: https://issues.apache.org/jira/browse/FLINK-17177
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.10.0, 1.10.1
>            Reporter: Canbin Zheng
>            Priority: Major
>             Fix For: 1.11.0
>
>
> Currently, once we receive an *ERROR* event that is sent from the K8s API 
> server via the K8s {{Watcher}}, then {{KubernetesResourceManager#onError}} 
> will handle it by calling the 
> {{KubernetesResourceManager#removePodIfTerminated}}. This may be incorrect 
> since the *ERROR* event may indicate an exception in the HTTP layer, which 
> means the previously created {{Watcher}} may be no longer available and we'd 
> better re-create the {{Watcher}} immediately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to