[ 
https://issues.apache.org/jira/browse/SPARK-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517285#comment-14517285
 ] 

Apache Spark commented on SPARK-5529:
-------------------------------------

User 'alexrovner' has created a pull request for this issue:
https://github.com/apache/spark/pull/5745

> BlockManager heartbeat expiration does not kill executor
> --------------------------------------------------------
>
>                 Key: SPARK-5529
>                 URL: https://issues.apache.org/jira/browse/SPARK-5529
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, YARN
>    Affects Versions: 1.2.0
>            Reporter: Hong Shen
>            Assignee: Hong Shen
>             Fix For: 1.4.0
>
>         Attachments: SPARK-5529.patch
>
>
> When I run a spark job, one executor is hold, after 120s, blockManager is 
> removed by driver, but after half an hour before the executor is remove by  
> driver. Here is the log:
> {code}
> 15/02/02 14:58:43 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(1, 10.215.143.14, 47234) with no recent heart beats: 147198ms 
> exceeds 120000ms
> ....
> 15/02/02 15:26:55 ERROR YarnClientClusterScheduler: Lost executor 1 on 
> 10.215.143.14: remote Akka client disassociated
> 15/02/02 15:26:55 WARN ReliableDeliverySupervisor: Association with remote 
> system [akka.tcp://sparkExecutor@10.215.143.14:46182] has failed, address is 
> now gated for [5000] ms. Reason is: [Disassociated].
> 15/02/02 15:26:55 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 
> 0.0
> 15/02/02 15:26:55 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3, 
> 10.215.143.14): ExecutorLostFailure (executor 1 lost)
> 15/02/02 15:26:55 ERROR YarnClientSchedulerBackend: Asked to remove 
> non-existent executor 1
> 15/02/02 15:26:55 INFO DAGScheduler: Executor lost: 1 (epoch 0)
> 15/02/02 15:26:55 INFO BlockManagerMasterActor: Trying to remove executor 1 
> from BlockManagerMaster.
> 15/02/02 15:26:55 INFO BlockManagerMaster: Removed 1 successfully in 
> removeExecutor
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to