[jira] [Resolved] (SPARK-15865) Blacklist should not result in job hanging with less than 4 executors

Imran Rashid (JIRA) Thu, 30 Jun 2016 11:37:49 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-15865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Imran Rashid resolved SPARK-15865.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 2.1.0

Issue resolved by pull request 13603
[https://github.com/apache/spark/pull/13603]

> Blacklist should not result in job hanging with less than 4 executors
> ---------------------------------------------------------------------
>
>                 Key: SPARK-15865
>                 URL: https://issues.apache.org/jira/browse/SPARK-15865
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 2.0.0
>            Reporter: Imran Rashid
>            Assignee: Imran Rashid
>             Fix For: 2.1.0
>
>
> Currently when you turn on blacklisting with 
> {{spark.scheduler.executorTaskBlacklistTime}}, but you have fewer than 
> {{spark.task.maxFailures}} executors, you can end with a job "hung" after 
> some task failures.
> If some task fails regularly (say, due to error in user code), then the task 
> will be blacklisted from the given executor.  It will then try another 
> executor, and fail there as well.  However, after it has tried all available 
> executors, the scheduler will simply stop trying to schedule the task 
> anywhere.  The job doesn't fail, nor it does it succeed -- it simply waits.  
> Eventually, when the blacklist expires, the task will be scheduled again.  
> But that can be quite far in the future, and in the meantime the user just 
> observes a stuck job.
> Instead we should abort the stage (and fail any dependent jobs) as soon as we 
> detect tasks that cannot be scheduled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-15865) Blacklist should not result in job hanging with less than 4 executors

Reply via email to