[ 
https://issues.apache.org/jira/browse/SPARK-17501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484185#comment-15484185
 ] 

cen yuhai edited comment on SPARK-17501 at 9/15/16 9:19 AM:
------------------------------------------------------------

I can't hardly reproduce this error. But maybe I found the root cause. In 
HeatbeatReceiver, executor is recorded by executorLastSeen. But Blockmanager is 
recorded by blockManagerInfo in BlockManagerMasterEndpoint.It should not 
register BlockManager,I think just put it into executorLastSeen which will 
resolve this problem.


was (Author: cenyuhai):
I can't hardly reproduce this error. But maybe I found the root cause. In 
HeatbeatReceiver, executor is record by executorLastSeen. But Blockmanager is 
record by blockManagerInfo in BlockManagerMasterEndpoint.It should not register 
BlockManager,Executor need to send RegisterExecutor.

> Re-register BlockManager again and again
> ----------------------------------------
>
>                 Key: SPARK-17501
>                 URL: https://issues.apache.org/jira/browse/SPARK-17501
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.2
>            Reporter: cen yuhai
>            Priority: Minor
>
> After many times re-register, executor will exit because of timeout 
> exception....
> {code}
> 16/09/11 04:02:42 INFO executor.Executor: Told to re-register on heartbeat
> 16/09/11 04:02:42 INFO storage.BlockManager: BlockManager re-registering with 
> master
> 16/09/11 04:02:42 INFO storage.BlockManagerMaster: Trying to register 
> BlockManager
> 16/09/11 04:02:42 INFO storage.BlockManagerMaster: Registered BlockManager
> 16/09/11 04:02:42 INFO storage.BlockManager: Reporting 0 blocks to the master.
> 16/09/11 04:02:52 INFO executor.Executor: Told to re-register on heartbeat
> 16/09/11 04:02:52 INFO storage.BlockManager: BlockManager re-registering with 
> master
> 16/09/11 04:02:52 INFO storage.BlockManagerMaster: Trying to register 
> BlockManager
> 16/09/11 04:02:52 INFO storage.BlockManagerMaster: Registered BlockManager
> 16/09/11 04:02:52 INFO storage.BlockManager: Reporting 0 blocks to the master.
> 16/09/11 04:03:02 INFO executor.Executor: Told to re-register on heartbeat
> 16/09/11 04:03:02 INFO storage.BlockManager: BlockManager re-registering with 
> master
> 16/09/11 04:03:02 INFO storage.BlockManagerMaster: Trying to register 
> BlockManager
> 16/09/11 04:03:02 INFO storage.BlockManagerMaster: Registered BlockManager
> 16/09/11 04:03:02 INFO storage.BlockManager: Reporting 0 blocks to the master.
> 16/09/11 04:03:12 INFO executor.Executor: Told to re-register on heartbeat
> 16/09/11 04:03:12 INFO storage.BlockManager: BlockManager re-registering with 
> master
> 16/09/11 04:03:12 INFO storage.BlockManagerMaster: Trying to register 
> BlockManager
> 16/09/11 04:03:12 INFO storage.BlockManagerMaster: Registered BlockManager
> 16/09/11 04:03:12 INFO storage.BlockManager: Reporting 0 blocks to the master.
> 16/09/11 04:03:22 INFO executor.Executor: Told to re-register on heartbeat
> 16/09/11 04:03:22 INFO storage.BlockManager: BlockManager re-registering with 
> master
> 16/09/11 04:03:22 INFO storage.BlockManagerMaster: Trying to register 
> BlockManager
> 16/09/11 04:03:22 INFO storage.BlockManagerMaster: Registered BlockManager
> 16/09/11 04:03:22 INFO storage.BlockManager: Reporting 0 blocks to the master.
> 16/09/11 04:03:32 INFO executor.Executor: Told to re-register on heartbeat
> 16/09/11 04:03:32 INFO storage.BlockManager: BlockManager re-registering with 
> master
> 16/09/11 04:03:32 INFO storage.BlockManagerMaster: Trying to register 
> BlockManager
> 16/09/11 04:03:32 INFO storage.BlockManagerMaster: Registered BlockManager
> 16/09/11 04:03:32 INFO storage.BlockManager: Reporting 0 blocks to the master.
> 16/09/11 04:03:42 INFO executor.Executor: Told to re-register on heartbeat
> 16/09/11 04:03:42 INFO storage.BlockManager: BlockManager re-registering with 
> master
> 16/09/11 04:03:42 INFO storage.BlockManagerMaster: Trying to register 
> BlockManager
> 16/09/11 04:03:42 INFO storage.BlockManagerMaster: Registered BlockManager
> 16/09/11 04:03:42 INFO storage.BlockManager: Reporting 0 blocks to the master.
> 16/09/11 04:03:45 ERROR executor.CoarseGrainedExecutorBackend: Cannot 
> register with driver: 
> spark://[email protected]:22168
> org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 
> seconds. This timeout is controlled by spark.rpc.askTimeout
>         at 
> org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
>         at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
>         at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>         at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
>         at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
>         at scala.util.Try$.apply(Try.scala:161)
>         at scala.util.Failure.recover(Try.scala:185)
>         at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
>         at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
>         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>         at 
> org.spark-project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
>         at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:133)
>         at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
>         at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
>         at scala.concurrent.Promise$class.complete(Promise.scala:55)
>         at 
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
>         at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>         at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>         at 
> scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.processBatch$1(Future.scala:643)
>         at 
> scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply$mcV$sp(Future.scala:658)
>         at 
> scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635)
>         at 
> scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635)
>         at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>         at 
> scala.concurrent.Future$InternalCallbackExecutor$Batch.run(Future.scala:634)
>         at 
> scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
>         at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:685)
>         at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
>         at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
>         at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
>         at 
> scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153)
>         at 
> org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:241)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to