wawa created FLINK-33096:
----------------------------
Summary: Flink on k8s,if one taskmanager pod was crashed,the whole
flink job will be failed
Key: FLINK-33096
URL: https://issues.apache.org/jira/browse/FLINK-33096
Project: Flink
Issue Type: Bug
Components: Deployment / Kubernetes
Affects Versions: 1.14.3
Reporter: wawa
The Flink version is 1.14.3, and the job is submitted to Kubernetes using the
Native Kubernetes application mode. During the scheduling process, when a
TaskManager pod crashes due to an exception, Kubernetes will attempt to start a
new TaskManager pod. However, the scheduling process is halted immediately,
resulting in the entire Flink job being terminated. On the other hand, if the
JobManager pod crashes, Kubernetes is able to successfully schedule a new
JobManager pod. This observation was made during application usage. Can you
please help analyze the underlying issue?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)