[
https://issues.apache.org/jira/browse/FLINK-36140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabor Somogyi closed FLINK-36140.
---------------------------------
> Log a warning when pods are terminated by kubernetes
> ----------------------------------------------------
>
> Key: FLINK-36140
> URL: https://issues.apache.org/jira/browse/FLINK-36140
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / Kubernetes
> Affects Versions: 1.19.1
> Reporter: Clara Xiong
> Assignee: Clara Xiong
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.0.0
>
>
> Scheduled maintenance or buggy nodes on Kubernetes can result random pod
> termination and eventually a series of job restarts due to rolling restart of
> the Kubernetes cluster nodes. The larger the job is the higher the chance it
> is affected. The jobs should be able to auto-recover from these issues, but
> can cause unwanted turbulence in large scale pipeline.
> In this case, it is very difficult to identify what is causing the restarts
> without knowing the issue at Kubernetes layer and the keyword to search with
> because it is logged at INFO level.
> We need to log this at higher level. If changing it from INFO to ERROR breaks
> monitoring we should at least log as warning.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)