[jira] [Closed] (FLINK-36140) Log a warning when pods are terminated by kubernetes

Gabor Somogyi (Jira) Mon, 26 Aug 2024 04:50:04 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-36140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gabor Somogyi closed FLINK-36140.
---------------------------------

> Log a warning when pods are terminated by kubernetes
> ----------------------------------------------------
>
>                 Key: FLINK-36140
>                 URL: https://issues.apache.org/jira/browse/FLINK-36140
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.19.1
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.0.0
>
>
> Scheduled maintenance or buggy nodes on Kubernetes can result random pod 
> termination and eventually a series of job restarts due to rolling restart of 
> the Kubernetes cluster nodes. The larger the job is the higher the chance it 
> is affected. The jobs should be able to auto-recover from these issues, but 
> can cause unwanted turbulence in large scale pipeline. 
> In this case, it is very difficult to identify what is causing the restarts 
> without knowing the issue at Kubernetes layer and the keyword to search with 
> because it is logged at INFO level.
> We need to log this at higher level. If changing it from INFO to ERROR breaks 
> monitoring we should at least log as warning. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-36140) Log a warning when pods are terminated by kubernetes

Reply via email to