[ 
https://issues.apache.org/jira/browse/FLINK-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907612#comment-15907612
 ] 

ASF GitHub Bot commented on FLINK-5999:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/3526

    [FLINK-5999] [resMgnr] Move JobLeaderIdService shut down into 
ResourceManagerRunner

    The JobLeaderIdService is being created by the ResourceManagerRunner and 
then given to a
    ResourceManager. Before the ResourceManager stopped the service before 
being stopped
    itself. This could lead to a concurrent modification exception by a state 
changing action
    executed by the actor thread. In order to avoid this concurrent 
modification, the service's
    shut down is now being executed after the ResourceManager has been shut 
down.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink 
resourceManagerServiceLifecycle

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3526.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3526
    
----
commit 978ad4d55c0b52931c00d994c676dfd1d57b45b0
Author: Till Rohrmann <trohrm...@apache.org>
Date:   2017-03-13T14:55:02Z

    [FLINK-5999] [resMgnr] Move JobLeaderIdService shut down into 
ResourceManagerRunner
    
    The JobLeaderIdService is being created by the ResourceManagerRunner and 
then given to a
    ResourceManager. Before the ResourceManager stopped the service before 
being stopped
    itself. This could lead to a concurrent modification exception by a state 
changing action
    executed by the actor thread. In order to avoid this concurrent 
modification, the service's
    shut down is now being executed after the ResourceManager has been shut 
down.

----


> MiniClusterITCase.runJobWithMultipleRpcServices fails
> -----------------------------------------------------
>
>                 Key: FLINK-5999
>                 URL: https://issues.apache.org/jira/browse/FLINK-5999
>             Project: Flink
>          Issue Type: Test
>          Components: Distributed Coordination, Tests
>            Reporter: Ufuk Celebi
>            Assignee: Till Rohrmann
>            Priority: Critical
>              Labels: test-stability
>
> In a branch with unrelated changes to the web frontend I've seen the 
> following test fail:
> {code}
> runJobWithMultipleRpcServices(org.apache.flink.runtime.minicluster.MiniClusterITCase)
>   Time elapsed: 1.145 sec  <<< ERROR!
> java.util.ConcurrentModificationException: null
>       at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429)
>       at java.util.HashMap$ValueIterator.next(HashMap.java:1458)
>       at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114)
>       at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92)
>       at 
> org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:182)
>       at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:83)
>       at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:78)
>       at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:313)
>       at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:281)
>       at 
> org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleRpcServices(MiniClusterITCase.java:72)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to