[ 
https://issues.apache.org/jira/browse/SPARK-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651505#comment-14651505
 ] 

Apache Spark commented on SPARK-9552:
-------------------------------------

User 'GraceH' has created a pull request for this issue:
https://github.com/apache/spark/pull/7888

> Add force control for killExecutors to avoid false killing for those busy 
> executors
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-9552
>                 URL: https://issues.apache.org/jira/browse/SPARK-9552
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 1.4.0, 1.4.1
>            Reporter: Jie Huang
>
> By using the dynamic allocation, sometimes it occurs false killing for those 
> busy executors. Some executors with assignments will be killed because of 
> being idle for enough time (say 60 seconds). The root cause is that the 
> Task-Launch listener event is asynchronized.
> For example, some executors are under assigning tasks, but not sending out 
> the listener notification yet. Meanwhile, the dynamic allocation's executor 
> idle time is up (e.g., 60 seconds). It will trigger killExecutor event at the 
> same time.
> the timer expiration starts before the listener event arrives.
> Then, the task is going to run on top of that killed/killing executor. It 
> will lead to task failure finally.
> Here is the proposal to fix it. We can add the force control for 
> killExecutor. If the force control is not set (i.e., false), we'd better to 
> check if the executor under killing is idle or busy. If the current executor 
> has some assignment, we should not kill that executor and return back false 
> (to indicate killing failure). In dynamic allocation, we'd better to turn off 
> force killing (i.e., force = false), we will meet killing failure if tries to 
> kill a busy executor. And then, the executor timer won't be invalid. Later 
> on, the task assignment event arrives, we can remove the idle timer 
> accordingly. So that we can avoid false killing for those busy executors in 
> dynamic allocation.
> For the rest of usages, the end users can decide if to use force killing or 
> not by themselves. If to turn on that option, the killExecutor will do the 
> action without any status checking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to