[jira] [Commented] (SAMZA-1116) Yarn RM recovery causing duplicate containers

Jake Maes (JIRA) Thu, 02 Mar 2017 07:49:38 -0800

    [ 
https://issues.apache.org/jira/browse/SAMZA-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892439#comment-15892439
 ]


Jake Maes commented on SAMZA-1116:
----------------------------------

Thanks for reporting this [~danil]. 

I'm going to associate this ticket with SAMZA-871, as I think that's probably 
the best fix. 

I'm assuming there was only one RM and that YARN was not configured with RM HA. 
Is that right?  
(https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html)
 

With HA, this scenario should be less likely because all the RMs would need to 
be terminated before the AM would fail. If you don't have HA configured 
already, that may be one workaround until SAMZA-871 is implemented. 

Thanks!

> Yarn RM recovery causing duplicate containers
> ---------------------------------------------
>
>                 Key: SAMZA-1116
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1116
>             Project: Samza
>          Issue Type: Bug
>    Affects Versions: 0.11
>            Reporter: Danil Serdyuchenko
>
> To replicate:
> # Make sure that Yarn RM recovery is enabled
> # Deploy a test job
> # Terminate Yarn RM
> # Wait until AM of the job terminate with: 
> {code}
> 2017-02-02 13:08:04 RetryInvocationHandler [WARN] Exception while invoking 
> class 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster
>  over rm2. Not retrying because failovers (30) exceeded maximum allowed (30)
> {code}
> # Restart RM
> The job should get a new attempt but the old containers will not be 
> terminated, causing duplicate containers to run. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (SAMZA-1116) Yarn RM recovery causing duplicate containers

Reply via email to