[
https://issues.apache.org/jira/browse/FLINK-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till Rohrmann closed FLINK-3346.
--------------------------------
Resolution: Won't Do
No longer a problem since we removed the legacy code.
> Replace Akka death watch with own heartbeats
> --------------------------------------------
>
> Key: FLINK-3346
> URL: https://issues.apache.org/jira/browse/FLINK-3346
> Project: Flink
> Issue Type: Improvement
> Components: Distributed Coordination
> Affects Versions: 1.0.0
> Reporter: Till Rohrmann
> Priority: Minor
>
> Currently, we're using Akka's death watch to detect failed instances (e.g.
> the JM watches the TM and vice versa). Whenever a death watch is triggered,
> the corresponding {{ActorSystem}} is quarantined. This will prevent the
> quarantined {{ActorSystem}} from reconnecting to the detecting system, in
> case that it didn't die. This can happen once in a while given a sub-optimal
> death watch configuration.
> In order to prevent this from happening, we could implement our own
> heartbeats which are periodically sent from the {{TaskManagers}} to the
> {{JobManager}} and vice versa, for example. Not receiving a certain number of
> heartbeats could then mean that the {{TaskManager}} is marked as dead.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)