Andrew Ash created SPARK-3736:
---------------------------------

             Summary: Workers should reconnect to Master if disconnected
                 Key: SPARK-3736
                 URL: https://issues.apache.org/jira/browse/SPARK-3736
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.0.2
            Reporter: Andrew Ash


In standalone mode, when a worker gets disconnected from the master for some 
reason it never attempts to reconnect.  In this situation you have to bounce 
the worker before it will reconnect to the master.

The preferred alternative is to follow what Hadoop does -- when there's a 
disconnect, attempt to reconnect at a particular interval until successful (I 
think it repeats indefinitely every 10sec).

This has been observed by:

- [~pkolaczk] in 
http://apache-spark-user-list.1001560.n3.nabble.com/Workers-disconnected-from-master-sometimes-and-never-reconnect-back-td6240.html
- [~romi-totango] in 
http://apache-spark-user-list.1001560.n3.nabble.com/Re-Workers-disconnected-from-master-sometimes-and-never-reconnect-back-td15335.html
- [~aash]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to