[jira] [Updated] (SPARK-11306) Executor JVM loss can lead to a hang in Standalone mode

Sean Owen (JIRA) Tue, 27 Oct 2015 03:09:52 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen updated SPARK-11306:
------------------------------
    Component/s: Scheduler

> Executor JVM loss can lead to a hang in Standalone mode
> -------------------------------------------------------
>
>                 Key: SPARK-11306
>                 URL: https://issues.apache.org/jira/browse/SPARK-11306
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>            Reporter: Kay Ousterhout
>            Assignee: Kay Ousterhout
>
> This commit: 
> https://github.com/apache/spark/commit/af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0
>  introduced a bug where, in Standalone mode, if a task fails and crashes the 
> JVM, the failure is considered a "normal failure" (meaning it's considered 
> unrelated to the task), so the failure isn't counted against the task's 
> maximum number of failures: 
> https://github.com/apache/spark/commit/af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0#diff-a755f3d892ff2506a7aa7db52022d77cL138.
>   As a result, if a task fails in a way that results in it crashing the JVM, 
> it will continuously be re-launched, resulting in a hang.
> Unfortunately this issue is difficult to reproduce because of a race 
> condition where we have multiple code paths that are used to handle executor 
> losses, and in the setup I'm using, Akka's notification that the executor was 
> lost always gets to the TaskSchedulerImpl first, so the task eventually gets 
> killed (see my recent email to the dev list).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-11306) Executor JVM loss can lead to a hang in Standalone mode

Reply via email to