[ 
https://issues.apache.org/jira/browse/SPARK-18976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-18976:
------------------------------
    Target Version/s:   (was: 1.6.1)
       Fix Version/s:     (was: 1.6.1)

[~liujianhui] please read http://spark.apache.org/contributing.html before 
opening a JIRA. Don't set fix/target version. It doesn't even make sense to 
target a version released a year ago.

Can you test this vs 2.1 or master? 
Can you provide any reproduction or more detail?
I'm not sure this is enough info.

> in standlone mode,executor expired by HeartbeanReceiver that still take up 
> cores but no tasks assigned to 
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-18976
>                 URL: https://issues.apache.org/jira/browse/SPARK-18976
>             Project: Spark
>          Issue Type: Bug
>          Components: Deploy
>    Affects Versions: 1.6.1
>         Environment: jdk1.8.0_77 Red Hat 4.4.7-11
>            Reporter: liujianhui
>         Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> h2. scene
> when executor expired by HeartbeatReceiver in driver, driver will mark that 
> executor as not live, task scheduler will not assign tasks to that executor, 
> but that executor's status will always be running and take up cores, the 
> executor 18 was expired and no task running, the task time far less than the 
> normal executor 142, but in app page, the executor is running
> !screenshot-1.png!
> !screenshot-2.png!
> !screenshot-3.png!
> h2.process:
> # exeuctor expired by HearbeatReceiver because the last heartbeat execeed the 
> executor timeout
> # executor will be removed in CoarseGrainedSchdulerBackend.killExecutors, so 
> that executor will marked as dead, it will not scheduled as offer since now 
> because it in executorsPendingToRemove
> # status of that executor is running because the CoarseGrainedExecutorBackend 
> processor is also exist and it register block manager to the driver every 
> 10s, log as 
> {code}
> 16/12/22 17:04:26 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:26 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:26 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:26 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:26 INFO BlockManager: Reporting 0 blocks to the master.
> 16/12/22 17:04:36 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:36 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:36 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:36 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:36 INFO BlockManager: Reporting 0 blocks to the master.
> 16/12/22 17:04:46 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:46 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:46 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:46 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:46 INFO BlockManager: Reporting 0 blocks to the master.
> 16/12/22 17:04:56 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:56 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:56 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:56 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:56 INFO BlockManager: Reporting 0 blocks to the master. 
> {code}
> h2. resolve 
> when the register times exceed some threshold(e.g. 10), the executor should 
> exit as zero 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to