[
https://issues.apache.org/jira/browse/FLINK-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till Rohrmann closed FLINK-11137.
---------------------------------
Resolution: Duplicate
> Unexpected RegistrationTimeoutException of TaskExecutor
> -------------------------------------------------------
>
> Key: FLINK-11137
> URL: https://issues.apache.org/jira/browse/FLINK-11137
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.7.0
> Reporter: Biao Liu
> Assignee: Biao Liu
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> There is a race condition in {{TaskExecutor}} between starting registering to
> RM and checking registration timeout. Currently we start RM leader retriever
> first, and then start registration timeout checking. If registration is fast
> enough, there is a possibility that registration is finished before starting
> checking registration timeout. The timeout checking will fail later.
> There is a stack trace of exception below:
> {quote}2018-11-05 14:16:52,464 ERROR
> org.apache.flink.runtime.taskexecutor.TaskExecutor - Fatal error occurred in
> TaskExecutor akka.tcp://flink@..../user/taskmanager_0.
>
> org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException:
> Could not register at the ResourceManager within the specified maximum
> registration duration 300000 ms. This indicates a problem with this instance.
> Terminating now.
> at
> org.apache.flink.runtime.taskexecutor.TaskExecutor.registrationTimeout(TaskExecutor.java:1110)
> at
> org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$startRegistrationTimeout$4(TaskExecutor.java:1096)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
> at
> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
> at akka.actor.ActorCell.invoke(ActorCell.scala:495)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
> at akka.dispatch.Mailbox.run(Mailbox.scala:224)
> {quote}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)