xiechenling created FLINK-36451:
-----------------------------------
Summary: Kubernetes Application JobManager Potential Deadlock and
TaskManager Pod Residuals
Key: FLINK-36451
URL: https://issues.apache.org/jira/browse/FLINK-36451
Project: Flink
Issue Type: Bug
Affects Versions: 1.19.1
Environment: * Flink version: 1.19.1
* - Deployment mode: Flink Kubernetes Application Mode
* - JVM version: OpenJDK 17
Reporter: xiechenling
Attachments: 1.png, 2.png, jobmanager.log, jstack.txt
In Kubernetes Application Mode, when there is significant etcd latency or
instability, the Flink JobManager may enter a deadlock situation. Additionally,
TaskManager pods are not cleaned up properly, resulting in stale resources that
prevent the Flink job from recovering correctly. This issue occurs during
frequent service restarts or network instability.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)