[jira] [Updated] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason

Patrick Wendell (JIRA) Thu, 05 Jun 2014 11:23:39 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Patrick Wendell updated SPARK-2019:
-----------------------------------

    Description: 
We either have to reboot all the nodes, or run 'sudo service spark-worker 
restart' across our cluster.  I don't think this should happen - the job 
failures are often not even that bad.  There is a 5 upvoted SO question here: 
http://stackoverflow.com/questions/22

We shouldn't be giving restart privileges to our devs, and therefore our sysadm 
has to frequently restart the workers.  When the sysadm is not around, there is 
nothing our devs can do.

Many thanks

  was:
We either have to reboot all the nodes, or run 'sudo service spark-worker 
restart' across our cluster.  I don't think this should happen - the job 
failures are often not even that bad.  There is a 5 upvoted SO question here: 
http://stackoverflow.com/questions/22Hey 
@sam031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails

We shouldn't be giving restart privileges to our devs, and therefore our sysadm 
has to frequently restart the workers.  When the sysadm is not around, there is 
nothing our devs can do.

Many thanks


> Spark workers die/disappear when job fails for nearly any reason
> ----------------------------------------------------------------
>
>                 Key: SPARK-2019
>                 URL: https://issues.apache.org/jira/browse/SPARK-2019
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: sam
>
> We either have to reboot all the nodes, or run 'sudo service spark-worker 
> restart' across our cluster.  I don't think this should happen - the job 
> failures are often not even that bad.  There is a 5 upvoted SO question here: 
> http://stackoverflow.com/questions/22
> We shouldn't be giving restart privileges to our devs, and therefore our 
> sysadm has to frequently restart the workers.  When the sysadm is not around, 
> there is nothing our devs can do.
> Many thanks



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason

Reply via email to