[
https://issues.apache.org/jira/browse/SPARK-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-4991:
-----------------------------------
Assignee: Apache Spark
> Worker should reconnect to Master when Master actor restart
> -----------------------------------------------------------
>
> Key: SPARK-4991
> URL: https://issues.apache.org/jira/browse/SPARK-4991
> Project: Spark
> Issue Type: Improvement
> Components: Deploy, Spark Core
> Affects Versions: 1.0.0, 1.1.0, 1.2.0
> Reporter: Zhang, Liye
> Assignee: Apache Spark
>
> This is a following JIRA of
> [SPARK-4989|https://issues.apache.org/jira/browse/SPARK-4989]. when Master
> akka actor encounter an exception, the Master will restart (akka actor
> restart not JVM restart). And all old information are cleared on Master
> (including workers, applications, etc). However, the workers are not aware of
> this at all. The state of the cluster is that: the master is on, and all
> workers are also on, but master is not aware of the exists of workers, and
> will ignore all worker's heartbeat because all workers are not registered. So
> that the whole cluster is not available.
> For some other information about this part, please refer to
> [SPARK-3736|https://issues.apache.org/jira/browse/SPARK-3736] and
> [SPARK-4592|https://issues.apache.org/jira/browse/SPARK-4592]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]