[ 
https://issues.apache.org/jira/browse/FLINK-16299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050674#comment-17050674
 ] 

Yangze Guo commented on FLINK-16299:
------------------------------------

After a deeper investigation, I believe this is actually not an issue.

When the application master(job manager in Flink) failover, only containers 
whose state is "RUNNING" will be kept, others will be killed and thus would not 
be returned in {{getContainersFromPreviousAttempts}}. Currently, this logic 
exists in all internal schedulers of Yarn. So, there is no need to worry about 
container leak in such scenario.

> Release containers recovered from previous attempt in which TaskExecutor is 
> not started.
> ----------------------------------------------------------------------------------------
>
>                 Key: FLINK-16299
>                 URL: https://issues.apache.org/jira/browse/FLINK-16299
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN
>            Reporter: Xintong Song
>            Assignee: Yangze Guo
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> As discussed in FLINK-16215, on Yarn deployment, {{YarnResourceManager}} 
> starts a new {{TaskExecutor}} in two steps:
>  # Request a new container from Yarn
>  # Starts a {{TaskExecutor}} process in the allocated container
> If JM failover happens between the two steps, in the new attempt 
> {{YarnResourceManager}} will not start {{TaskExecutor}} processes in 
> recovered containers. That means such containers are neither used nor 
> released.
> A potential fix to this problem is to query for the container status by 
> calling {{NMClientAsync#getContainerStatusAsync}}, and release the containers 
> whose state is {{NEW}}, keeps only those whose state is {{RUNNING}} and 
> waiting for them to register.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to