Xintong Song created FLINK-21667:
------------------------------------
Summary: Standby RM might remove resources from Kubernetes
Key: FLINK-21667
URL: https://issues.apache.org/jira/browse/FLINK-21667
Project: Flink
Issue Type: Bug
Components: Deployment / Kubernetes
Affects Versions: 1.12.2
Reporter: Xintong Song
Fix For: 1.13.0, 1.12.3
Currently, on initialization {{KubernetesResourceManagerDriver}} starts a watch
for receiving pod events. It could happen that it starts to receive events
before obtaining leadership. Consequently, a standby RM may remove terminated
pods from Kubernetes during handling the events.
This is not very damaging atm, since the removed pods are already terminated
anyway. However, it would still be good for a standby RM to strictly following
the contract and make no modifications before obtaining leadership. We might
consider to postpone starting of the watch to when the leadership is granted.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)