[ 
https://issues.apache.org/jira/browse/FLINK-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946842#comment-15946842
 ] 

ASF GitHub Bot commented on FLINK-6213:
---------------------------------------

GitHub user barcahead opened a pull request:

    https://github.com/apache/flink/pull/3640

    [FLINK-6213] [yarn] terminate resource manager itself when shutting down 
application

    When number of failed containers exceeds maximum failed containers, 
`YarnFlinkResourceManager` will receive msg `StopCluster` and then invoke 
`shutdownApplication`. In this method, it calls 
`amrmclient.unregisterApplicationMaster` to finish the application. But the AM 
container is not released until 10 minutes later triggered by RM ping check 
timeout. 
    I fix this issue by terminating resource manager itself after unregistering 
application master, then the process will exit and the container will be 
released.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/barcahead/flink FLINK-6213

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3640.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3640
    
----
commit 1f4c91af090189d8a797a500701689b6639c4a85
Author: fengyelei <[email protected]>
Date:   2017-03-29T03:40:24Z

    [FLINK-6213] [yarn] terminate resource manager itself when shutting down 
application

----


> When number of failed containers exceeds maximum failed containers and 
> application is stopped, the AM container will be released 10 minutes later 
> --------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-6213
>                 URL: https://issues.apache.org/jira/browse/FLINK-6213
>             Project: Flink
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.2.0, 1.3.0
>            Reporter: Yelei Feng
>
> When number of failed containers exceeds maximum failed containers and 
> application is stopped, the AM container will be released 10 minutes later. I 
> checked yarn log and found out after invoking 
> {{unregisterApplicationMaster}}, the AM container is not released. After 10 
> minutes, the release is triggered by RM ping check timeout.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to