[ 
https://issues.apache.org/jira/browse/FLINK-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-24713:
---------------------------
    Description: 
When ResourceManager started, JobManger will connect to the ResourceManager, 
this means the ResourceManager will begin to try serve the resource requests 
from SlotManager.

If ResourceManager failover, although it will try to recover the pod / 
container from previous attempt, But new resource requirements may happen 
before the old taskManger register to slotManager. 

In this case, it may double the required taskManager when jobManager failover. 
We may need a mechanism to postpone resourceManager serving after the recovery 
phase has finished

  was:
When ResourceManager started, JobManger will connect to the ResourceManager, 
this means the ResourceManage will begin to try serve the resource requests 
from SlotManager.

If ResourceManager failover, although it will try to recover the pod / 
container from previous attempt, But new resource requirements may happen 
before the old taskManger register to slotManager. 

In this case, it may double the required taskManager when jobManager failover. 
We may need a mechanism to postpone resourceManager serving after the recovery 
phase has finished


> Postpone resourceManager serving after the recovery phase has finished
> ----------------------------------------------------------------------
>
>                 Key: FLINK-24713
>                 URL: https://issues.apache.org/jira/browse/FLINK-24713
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.14.0
>            Reporter: Aitozi
>            Priority: Major
>
> When ResourceManager started, JobManger will connect to the ResourceManager, 
> this means the ResourceManager will begin to try serve the resource requests 
> from SlotManager.
> If ResourceManager failover, although it will try to recover the pod / 
> container from previous attempt, But new resource requirements may happen 
> before the old taskManger register to slotManager. 
> In this case, it may double the required taskManager when jobManager 
> failover. We may need a mechanism to postpone resourceManager serving after 
> the recovery phase has finished



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to