[
https://issues.apache.org/jira/browse/SPARK-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reynold Xin updated SPARK-5529:
-------------------------------
Description:
When I run a spark job, one executor is hold, after 120s, blockManager is
removed by driver, but after half an hour before the executor is remove by
driver. Here is the log:
{code}
15/02/02 14:58:43 WARN BlockManagerMasterActor: Removing BlockManager
BlockManagerId(1, 10.215.143.14, 47234) with no recent heart beats: 147198ms
exceeds 120000ms
....
15/02/02 15:26:55 ERROR YarnClientClusterScheduler: Lost executor 1 on
10.215.143.14: remote Akka client disassociated
15/02/02 15:26:55 WARN ReliableDeliverySupervisor: Association with remote
system [akka.tcp://[email protected]:46182] has failed, address is
now gated for [5000] ms. Reason is: [Disassociated].
15/02/02 15:26:55 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 0.0
15/02/02 15:26:55 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3,
10.215.143.14): ExecutorLostFailure (executor 1 lost)
15/02/02 15:26:55 ERROR YarnClientSchedulerBackend: Asked to remove
non-existent executor 1
15/02/02 15:26:55 INFO DAGScheduler: Executor lost: 1 (epoch 0)
15/02/02 15:26:55 INFO BlockManagerMasterActor: Trying to remove executor 1
from BlockManagerMaster.
15/02/02 15:26:55 INFO BlockManagerMaster: Removed 1 successfully in
removeExecutor
{code}
was:
When I run a spark job, one executor is hold, after 120s, blockManager is
removed by driver, but after half an hour before the executor is remove by
driver. Here is the log:
15/02/02 14:58:43 WARN BlockManagerMasterActor: Removing BlockManager
BlockManagerId(1, 10.215.143.14, 47234) with no recent heart beats: 147198ms
exceeds 120000ms
....
15/02/02 15:26:55 ERROR YarnClientClusterScheduler: Lost executor 1 on
10.215.143.14: remote Akka client disassociated
15/02/02 15:26:55 WARN ReliableDeliverySupervisor: Association with remote
system [akka.tcp://[email protected]:46182] has failed, address is
now gated for [5000] ms. Reason is: [Disassociated].
15/02/02 15:26:55 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 0.0
15/02/02 15:26:55 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3,
10.215.143.14): ExecutorLostFailure (executor 1 lost)
15/02/02 15:26:55 ERROR YarnClientSchedulerBackend: Asked to remove
non-existent executor 1
15/02/02 15:26:55 INFO DAGScheduler: Executor lost: 1 (epoch 0)
15/02/02 15:26:55 INFO BlockManagerMasterActor: Trying to remove executor 1
from BlockManagerMaster.
15/02/02 15:26:55 INFO BlockManagerMaster: Removed 1 successfully in
removeExecutor
> Executor is still hold while BlockManager has been removed
> ----------------------------------------------------------
>
> Key: SPARK-5529
> URL: https://issues.apache.org/jira/browse/SPARK-5529
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.2.0
> Reporter: Hong Shen
>
> When I run a spark job, one executor is hold, after 120s, blockManager is
> removed by driver, but after half an hour before the executor is remove by
> driver. Here is the log:
> {code}
> 15/02/02 14:58:43 WARN BlockManagerMasterActor: Removing BlockManager
> BlockManagerId(1, 10.215.143.14, 47234) with no recent heart beats: 147198ms
> exceeds 120000ms
> ....
> 15/02/02 15:26:55 ERROR YarnClientClusterScheduler: Lost executor 1 on
> 10.215.143.14: remote Akka client disassociated
> 15/02/02 15:26:55 WARN ReliableDeliverySupervisor: Association with remote
> system [akka.tcp://[email protected]:46182] has failed, address is
> now gated for [5000] ms. Reason is: [Disassociated].
> 15/02/02 15:26:55 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet
> 0.0
> 15/02/02 15:26:55 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3,
> 10.215.143.14): ExecutorLostFailure (executor 1 lost)
> 15/02/02 15:26:55 ERROR YarnClientSchedulerBackend: Asked to remove
> non-existent executor 1
> 15/02/02 15:26:55 INFO DAGScheduler: Executor lost: 1 (epoch 0)
> 15/02/02 15:26:55 INFO BlockManagerMasterActor: Trying to remove executor 1
> from BlockManagerMaster.
> 15/02/02 15:26:55 INFO BlockManagerMaster: Removed 1 successfully in
> removeExecutor
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]