[
https://issues.apache.org/jira/browse/FLINK-15347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077348#comment-17077348
]
Till Rohrmann commented on FLINK-15347:
---------------------------------------
The problem seems to be a race condition between two {{Dispatchers}} between
two leader sessions. Before the second leader instance can be created, the
former needs to be unregistered from the underlying {{AkkaRpcService}} because
both share the same endpoint name {{dispatcher}}. If the old leader is not
completely unregistered, then one sees the following exception
{code}
java.util.concurrent.CompletionException:
org.apache.flink.util.FlinkRuntimeException: Could not create the Dispatcher
rpc endpoint.
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at
java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:659)
at
java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.util.FlinkRuntimeException: Could not create the
Dispatcher rpc endpoint.
at
org.apache.flink.runtime.dispatcher.runner.DefaultDispatcherGatewayServiceFactory.create(DefaultDispatcherGatewayServiceFactory.java:66)
at
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.createDispatcher(SessionDispatcherLeaderProcess.java:100)
at
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.lambda$createDispatcherIfRunning$0(SessionDispatcherLeaderProcess.java:95)
at
org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.runIfState(AbstractDispatcherLeaderProcess.java:210)
at
org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.runIfStateIs(AbstractDispatcherLeaderProcess.java:198)
at
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.createDispatcherIfRunning(SessionDispatcherLeaderProcess.java:95)
at
java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656)
... 10 more
Caused by: akka.actor.InvalidActorNameException: actor name [dispatcher] is not
unique!
at
akka.actor.dungeon.ChildrenContainer$NormalChildrenContainer.reserve(ChildrenContainer.scala:129)
at akka.actor.dungeon.Children$class.reserveChild(Children.scala:135)
at akka.actor.ActorCell.reserveChild(ActorCell.scala:429)
at akka.actor.dungeon.Children$class.makeChild(Children.scala:275)
at akka.actor.dungeon.Children$class.attachChild(Children.scala:49)
at akka.actor.ActorCell.attachChild(ActorCell.scala:429)
at akka.actor.ActorSystemImpl.actorOf(ActorSystem.scala:753)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcService.startServer(AkkaRpcService.java:219)
at org.apache.flink.runtime.rpc.RpcEndpoint.<init>(RpcEndpoint.java:129)
at
org.apache.flink.runtime.rpc.FencedRpcEndpoint.<init>(FencedRpcEndpoint.java:48)
at
org.apache.flink.runtime.rpc.PermanentlyFencedRpcEndpoint.<init>(PermanentlyFencedRpcEndpoint.java:36)
at
org.apache.flink.runtime.dispatcher.Dispatcher.<init>(Dispatcher.java:137)
at
org.apache.flink.runtime.dispatcher.StandaloneDispatcher.<init>(StandaloneDispatcher.java:39)
at
org.apache.flink.runtime.dispatcher.SessionDispatcherFactory.createDispatcher(SessionDispatcherFactory.java:44)
at
org.apache.flink.runtime.dispatcher.SessionDispatcherFactory.createDispatcher(SessionDispatcherFactory.java:29)
at
org.apache.flink.runtime.dispatcher.runner.DefaultDispatcherGatewayServiceFactory.create(DefaultDispatcherGatewayServiceFactory.java:60)
... 16 more
{code}
> ZooKeeperDefaultDispatcherRunnerTest.testResourceCleanupUnderLeadershipChange
> failed on Travis
> ----------------------------------------------------------------------------------------------
>
> Key: FLINK-15347
> URL: https://issues.apache.org/jira/browse/FLINK-15347
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.11.0
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Priority: Critical
> Labels: test-stability
> Fix For: 1.11.0
>
>
> The test
> {{ZooKeeperDefaultDispatcherRunnerTest.testResourceCleanupUnderLeadershipChange}}
> failed on Travis because it got stuck.
> https://api.travis-ci.org/v3/job/627661879/log.txt
--
This message was sent by Atlassian Jira
(v8.3.4#803005)