sam created SPARK-2019:
--------------------------
Summary: Spark workers die/disappear when job fails for nearly any
reason
Key: SPARK-2019
URL: https://issues.apache.org/jira/browse/SPARK-2019
Project: Spark
Issue Type: Bug
Reporter: sam
We either have to reboot all the nodes, or run 'sudo service spark-worker
restart' across our cluster. I don't think this should happen - the job
failures are often not even that bad. There is a 5 upvoted SO question here:
http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails
We shouldn't be giving restart privileges to our devs, and therefore our sysadm
has to frequently restart the workers. When the sysadm is not around, there is
nothing our devs can do.
Many thanks
--
This message was sent by Atlassian JIRA
(v6.2#6252)