Andrew Ash created SPARK-3736:
---------------------------------
Summary: Workers should reconnect to Master if disconnected
Key: SPARK-3736
URL: https://issues.apache.org/jira/browse/SPARK-3736
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 1.0.2
Reporter: Andrew Ash
In standalone mode, when a worker gets disconnected from the master for some
reason it never attempts to reconnect. In this situation you have to bounce
the worker before it will reconnect to the master.
The preferred alternative is to follow what Hadoop does -- when there's a
disconnect, attempt to reconnect at a particular interval until successful (I
think it repeats indefinitely every 10sec).
This has been observed by:
- [~pkolaczk] in
http://apache-spark-user-list.1001560.n3.nabble.com/Workers-disconnected-from-master-sometimes-and-never-reconnect-back-td6240.html
- [~romi-totango] in
http://apache-spark-user-list.1001560.n3.nabble.com/Re-Workers-disconnected-from-master-sometimes-and-never-reconnect-back-td15335.html
- [~aash]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]