[
https://issues.apache.org/jira/browse/FLINK-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907612#comment-15907612
]
ASF GitHub Bot commented on FLINK-5999:
---------------------------------------
GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/3526
[FLINK-5999] [resMgnr] Move JobLeaderIdService shut down into
ResourceManagerRunner
The JobLeaderIdService is being created by the ResourceManagerRunner and
then given to a
ResourceManager. Before the ResourceManager stopped the service before
being stopped
itself. This could lead to a concurrent modification exception by a state
changing action
executed by the actor thread. In order to avoid this concurrent
modification, the service's
shut down is now being executed after the ResourceManager has been shut
down.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink
resourceManagerServiceLifecycle
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3526.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3526
----
commit 978ad4d55c0b52931c00d994c676dfd1d57b45b0
Author: Till Rohrmann <[email protected]>
Date: 2017-03-13T14:55:02Z
[FLINK-5999] [resMgnr] Move JobLeaderIdService shut down into
ResourceManagerRunner
The JobLeaderIdService is being created by the ResourceManagerRunner and
then given to a
ResourceManager. Before the ResourceManager stopped the service before
being stopped
itself. This could lead to a concurrent modification exception by a state
changing action
executed by the actor thread. In order to avoid this concurrent
modification, the service's
shut down is now being executed after the ResourceManager has been shut
down.
----
> MiniClusterITCase.runJobWithMultipleRpcServices fails
> -----------------------------------------------------
>
> Key: FLINK-5999
> URL: https://issues.apache.org/jira/browse/FLINK-5999
> Project: Flink
> Issue Type: Test
> Components: Distributed Coordination, Tests
> Reporter: Ufuk Celebi
> Assignee: Till Rohrmann
> Priority: Critical
> Labels: test-stability
>
> In a branch with unrelated changes to the web frontend I've seen the
> following test fail:
> {code}
> runJobWithMultipleRpcServices(org.apache.flink.runtime.minicluster.MiniClusterITCase)
> Time elapsed: 1.145 sec <<< ERROR!
> java.util.ConcurrentModificationException: null
> at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429)
> at java.util.HashMap$ValueIterator.next(HashMap.java:1458)
> at
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114)
> at
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92)
> at
> org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:182)
> at
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:83)
> at
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:78)
> at
> org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:313)
> at
> org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:281)
> at
> org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleRpcServices(MiniClusterITCase.java:72)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)