[ https://issues.apache.org/jira/browse/FLINK-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907612#comment-15907612 ]
ASF GitHub Bot commented on FLINK-5999: --------------------------------------- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/3526 [FLINK-5999] [resMgnr] Move JobLeaderIdService shut down into ResourceManagerRunner The JobLeaderIdService is being created by the ResourceManagerRunner and then given to a ResourceManager. Before the ResourceManager stopped the service before being stopped itself. This could lead to a concurrent modification exception by a state changing action executed by the actor thread. In order to avoid this concurrent modification, the service's shut down is now being executed after the ResourceManager has been shut down. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink resourceManagerServiceLifecycle Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3526.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3526 ---- commit 978ad4d55c0b52931c00d994c676dfd1d57b45b0 Author: Till Rohrmann <trohrm...@apache.org> Date: 2017-03-13T14:55:02Z [FLINK-5999] [resMgnr] Move JobLeaderIdService shut down into ResourceManagerRunner The JobLeaderIdService is being created by the ResourceManagerRunner and then given to a ResourceManager. Before the ResourceManager stopped the service before being stopped itself. This could lead to a concurrent modification exception by a state changing action executed by the actor thread. In order to avoid this concurrent modification, the service's shut down is now being executed after the ResourceManager has been shut down. ---- > MiniClusterITCase.runJobWithMultipleRpcServices fails > ----------------------------------------------------- > > Key: FLINK-5999 > URL: https://issues.apache.org/jira/browse/FLINK-5999 > Project: Flink > Issue Type: Test > Components: Distributed Coordination, Tests > Reporter: Ufuk Celebi > Assignee: Till Rohrmann > Priority: Critical > Labels: test-stability > > In a branch with unrelated changes to the web frontend I've seen the > following test fail: > {code} > runJobWithMultipleRpcServices(org.apache.flink.runtime.minicluster.MiniClusterITCase) > Time elapsed: 1.145 sec <<< ERROR! > java.util.ConcurrentModificationException: null > at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429) > at java.util.HashMap$ValueIterator.next(HashMap.java:1458) > at > org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114) > at > org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:182) > at > org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:83) > at > org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:78) > at > org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:313) > at > org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:281) > at > org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleRpcServices(MiniClusterITCase.java:72) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)