[ 
https://issues.apache.org/jira/browse/FLINK-16215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046062#comment-17046062
 ] 

Xintong Song commented on FLINK-16215:
--------------------------------------

[~liuyufei] 
I see your point. You mean for FLINK-15959, in case of failover it's hard to 
know how many new TMs we need to start to meet the min slots?

I did some investigation, and come up with another potential solution for 
identifying the allocated but non-started containers among all recovered 
containers. We can request container status from Yarn by calling 
{{NMClientAsync#getContainerStatusAsync}}, the result will be returned in 
{{onContainerStatusReceived}}. The {{ContainerStatus#getState}} should be 
{{RUNNING}} for those containers in which the {{TaskExecutor}} process is 
already started, and {{NEW}} for those containers allocated but not started yet.

In this way, we can release those containers allocated but not started, and 
also decide how many new TMs we need to start. An alternative is to directly 
start TM process inside those allocated but not started containers. But we are 
working on another effort to make RM not assuming all TMs have the same 
resource, which means RM cannot decide the resource configurations to start the 
new TM with.

> Start redundant TaskExecutor when JM failed
> -------------------------------------------
>
>                 Key: FLINK-16215
>                 URL: https://issues.apache.org/jira/browse/FLINK-16215
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.10.0
>            Reporter: YufeiLiu
>            Priority: Major
>
> TaskExecutor will reconnect to the new ResourceManager leader when JM failed, 
> and JobMaster will restart and reschedule job. If job slot request arrive 
> earlier than TM registration, RM will start new workers rather than reuse the 
> existing TMs.
> It‘s hard to reproduce becasue TM registration usually come first, and 
> timeout check will stop redundant TMs. 
> But I think it would be better if we make the {{recoverWokerNode}} to 
> interface, and put recovered slots in {{pendingSlots}} wait for TM 
> reconnection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to