[jira] [Commented] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason

Mark Hamstra (JIRA) Wed, 04 Jun 2014 08:35:33 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017776#comment-14017776
 ]


Mark Hamstra commented on SPARK-2019:
-------------------------------------

Please don't leave the Affects Version/s selector on None.  As with the SO 
question, is this an issue that you are seeing with Spark 0.9.0?  If so, then 
the version of Spark that you are using is significantly out of date even on 
the 0.9 branch.  Several bug fixes are present in the 0.9.1 release of Spark, 
which has been available for almost two months.  There are a few more in the 
current 0.9.2-SNAPSHOT code, and many more in the recent 1.0.0 release.

> Spark workers die/disappear when job fails for nearly any reason
> ----------------------------------------------------------------
>
>                 Key: SPARK-2019
>                 URL: https://issues.apache.org/jira/browse/SPARK-2019
>             Project: Spark
>          Issue Type: Bug
>            Reporter: sam
>
> We either have to reboot all the nodes, or run 'sudo service spark-worker 
> restart' across our cluster.  I don't think this should happen - the job 
> failures are often not even that bad.  There is a 5 upvoted SO question here: 
> http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails
> We shouldn't be giving restart privileges to our devs, and therefore our 
> sysadm has to frequently restart the workers.  When the sysadm is not around, 
> there is nothing our devs can do.
> Many thanks



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason

Reply via email to