[
https://issues.apache.org/jira/browse/SPARK-18976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
liujianhui updated SPARK-18976:
-------------------------------
Description:
h2. scene
when executor expired by HeartbeatReceiver in driver, driver will mark that
executor as not live, task scheduler will not assign tasks to that executor,
but that executor's status will always be running and take up cores, the
executor 18 was expired and no task running, the task time far less than the
normal executor 142, but in app page, the executor is running
!screenshot-1.png!
!screenshot-2.png!
!screenshot-3.png!
h2.process:
# exeuctor expired by HearbeatReceiver because the last heartbeat execeed the
executor timeout
# executor will be removed in CoarseGrainedSchdulerBackend.killExecutors, so
that executor will marked as dead, it will not scheduled as offer since now
because it in executorsPendingToRemove
# status of that executor is running because the CoarseGrainedExecutorBackend
processor is also exist and it register block manager to the driver every 10s,
log as
{code}
16/12/22 17:04:26 INFO Executor: Told to re-register on heartbeat
16/12/22 17:04:26 INFO BlockManager: BlockManager re-registering with master
16/12/22 17:04:26 INFO BlockManagerMaster: Trying to register BlockManager
16/12/22 17:04:26 INFO BlockManagerMaster: Registered BlockManager
16/12/22 17:04:26 INFO BlockManager: Reporting 0 blocks to the master.
16/12/22 17:04:36 INFO Executor: Told to re-register on heartbeat
16/12/22 17:04:36 INFO BlockManager: BlockManager re-registering with master
16/12/22 17:04:36 INFO BlockManagerMaster: Trying to register BlockManager
16/12/22 17:04:36 INFO BlockManagerMaster: Registered BlockManager
16/12/22 17:04:36 INFO BlockManager: Reporting 0 blocks to the master.
16/12/22 17:04:46 INFO Executor: Told to re-register on heartbeat
16/12/22 17:04:46 INFO BlockManager: BlockManager re-registering with master
16/12/22 17:04:46 INFO BlockManagerMaster: Trying to register BlockManager
16/12/22 17:04:46 INFO BlockManagerMaster: Registered BlockManager
16/12/22 17:04:46 INFO BlockManager: Reporting 0 blocks to the master.
16/12/22 17:04:56 INFO Executor: Told to re-register on heartbeat
16/12/22 17:04:56 INFO BlockManager: BlockManager re-registering with master
16/12/22 17:04:56 INFO BlockManagerMaster: Trying to register BlockManager
16/12/22 17:04:56 INFO BlockManagerMaster: Registered BlockManager
16/12/22 17:04:56 INFO BlockManager: Reporting 0 blocks to the master.
{code}
h2. resolve
when the register times exceed some threshold(e.g. 10), the executor should
exit as zero
was:
h2. scene
when executor expired by HeartbeatReceiver in driver, driver will mark that
executor as not live, task scheduler will not assign tasks to that executor,
but that executor's status will always be running and take up cores, the
executor 18 was expired and no task running, the task time far less than the
normal executor 142, but in app page, the executor is running
!screenshot-1.png!
!screenshot-2.png!
!screenshot-3.png!
h2.process:
# exeuctor expired by HearbeatReceiver because the last heartbeat execeed the
executor timeout
# executor will be removed in CoarseGrainedSchdulerBackend.killExecutors, so
that executor will marked as dead, it will not scheduled as offer since now
because it in executorsPendingToRemove
# status of that executor is running because the CoarseGrainedExecutorBackend
processor is also exist and it register block manager to the driver every 10s,
log as
{code}
16/12/22 17:04:26 INFO Executor: Told to re-register on heartbeat
16/12/22 17:04:26 INFO BlockManager: BlockManager re-registering with master
16/12/22 17:04:26 INFO BlockManagerMaster: Trying to register BlockManager
16/12/22 17:04:26 INFO BlockManagerMaster: Registered BlockManager
16/12/22 17:04:26 INFO BlockManager: Reporting 0 blocks to the master.
16/12/22 17:04:36 INFO Executor: Told to re-register on heartbeat
16/12/22 17:04:36 INFO BlockManager: BlockManager re-registering with master
16/12/22 17:04:36 INFO BlockManagerMaster: Trying to register BlockManager
16/12/22 17:04:36 INFO BlockManagerMaster: Registered BlockManager
16/12/22 17:04:36 INFO BlockManager: Reporting 0 blocks to the master.
16/12/22 17:04:46 INFO Executor: Told to re-register on heartbeat
16/12/22 17:04:46 INFO BlockManager: BlockManager re-registering with master
16/12/22 17:04:46 INFO BlockManagerMaster: Trying to register BlockManager
16/12/22 17:04:46 INFO BlockManagerMaster: Registered BlockManager
16/12/22 17:04:46 INFO BlockManager: Reporting 0 blocks to the master.
16/12/22 17:04:56 INFO Executor: Told to re-register on heartbeat
16/12/22 17:04:56 INFO BlockManager: BlockManager re-registering with master
16/12/22 17:04:56 INFO BlockManagerMaster: Trying to register BlockManager
16/12/22 17:04:56 INFO BlockManagerMaster: Registered BlockManager
16/12/22 17:04:56 INFO BlockManager: Reporting 0 blocks to the master.
{code}
> in standlone mode,executor expired by HeartbeanReceiver that still take up
> cores but no tasks assigned to
> ----------------------------------------------------------------------------------------------------------
>
> Key: SPARK-18976
> URL: https://issues.apache.org/jira/browse/SPARK-18976
> Project: Spark
> Issue Type: Bug
> Components: Deploy
> Affects Versions: 1.6.1
> Environment: jdk1.8.0_77 Red Hat 4.4.7-11
> Reporter: liujianhui
> Fix For: 1.6.1
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> h2. scene
> when executor expired by HeartbeatReceiver in driver, driver will mark that
> executor as not live, task scheduler will not assign tasks to that executor,
> but that executor's status will always be running and take up cores, the
> executor 18 was expired and no task running, the task time far less than the
> normal executor 142, but in app page, the executor is running
> !screenshot-1.png!
> !screenshot-2.png!
> !screenshot-3.png!
> h2.process:
> # exeuctor expired by HearbeatReceiver because the last heartbeat execeed the
> executor timeout
> # executor will be removed in CoarseGrainedSchdulerBackend.killExecutors, so
> that executor will marked as dead, it will not scheduled as offer since now
> because it in executorsPendingToRemove
> # status of that executor is running because the CoarseGrainedExecutorBackend
> processor is also exist and it register block manager to the driver every
> 10s, log as
> {code}
> 16/12/22 17:04:26 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:26 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:26 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:26 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:26 INFO BlockManager: Reporting 0 blocks to the master.
> 16/12/22 17:04:36 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:36 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:36 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:36 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:36 INFO BlockManager: Reporting 0 blocks to the master.
> 16/12/22 17:04:46 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:46 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:46 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:46 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:46 INFO BlockManager: Reporting 0 blocks to the master.
> 16/12/22 17:04:56 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:56 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:56 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:56 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:56 INFO BlockManager: Reporting 0 blocks to the master.
> {code}
> h2. resolve
> when the register times exceed some threshold(e.g. 10), the executor should
> exit as zero
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]