Biao Liu created FLINK-11137:
--------------------------------

             Summary: Unexpected RegistrationTimeoutException of TaskExecutor
                 Key: FLINK-11137
                 URL: https://issues.apache.org/jira/browse/FLINK-11137
             Project: Flink
          Issue Type: Bug
          Components: TaskManager
    Affects Versions: 1.7.0
            Reporter: Biao Liu
            Assignee: Biao Liu


There is a race condition in {{TaskExecutor}} between starting registering to 
RM and checking registration timeout. Currently we start RM leader retriever 
first, and then start registration timeout checking. If registration is fast 
enough, there is a possibility that registration is finished before starting 
checking registration timeout. The timeout checking will fail later.

There is a stack trace of exception below:
{quote}2018-11-05 14:16:52,464 ERROR 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Fatal error occurred in 
TaskExecutor akka.tcp://flink@..../user/taskmanager_0.
 org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException: 
Could not register at the ResourceManager within the specified maximum 
registration duration 300000 ms. This indicates a problem with this instance. 
Terminating now.
 at 
org.apache.flink.runtime.taskexecutor.TaskExecutor.registrationTimeout(TaskExecutor.java:1110)
 at 
org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$startRegistrationTimeout$4(TaskExecutor.java:1096)
 at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)
 at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)
 at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
 at 
akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
 at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
 at akka.actor.ActorCell.invoke(ActorCell.scala:495)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
 at akka.dispatch.Mailbox.run(Mailbox.scala:224)
{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to