Aitozi created FLINK-24713:
------------------------------
Summary: Postpone resourceManager serving after the recovery phase
has finished
Key: FLINK-24713
URL: https://issues.apache.org/jira/browse/FLINK-24713
Project: Flink
Issue Type: Improvement
Components: Runtime / Coordination
Affects Versions: 1.14.1
Reporter: Aitozi
When ResourceManager started, JobManger will connect to the ResourceManager,
this means the ResourceManage will begin to try serve the resource requests
from SlotManager.
If ResourceManager failover, although it will try to recover the pod /
container from previous attempt, But new resource requirements may happen
before the old taskManger register to slotManager.
In this case, it may double the required taskManager when jobManager failover.
We may need a mechanism to postpone resourceManager serving after the recovery
phase has finished
--
This message was sent by Atlassian Jira
(v8.3.4#803005)