[ 
https://issues.apache.org/jira/browse/FLINK-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-3346.
--------------------------------
    Resolution: Won't Do

No longer a problem since we removed the legacy code.

> Replace Akka death watch with own heartbeats
> --------------------------------------------
>
>                 Key: FLINK-3346
>                 URL: https://issues.apache.org/jira/browse/FLINK-3346
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination
>    Affects Versions: 1.0.0
>            Reporter: Till Rohrmann
>            Priority: Minor
>
> Currently, we're using Akka's death watch to detect failed instances (e.g. 
> the JM watches the TM and vice versa). Whenever a death watch is triggered, 
> the corresponding {{ActorSystem}} is quarantined. This will prevent the 
> quarantined {{ActorSystem}} from reconnecting to the detecting system, in 
> case that it didn't die. This can happen once in a while given a sub-optimal 
> death watch configuration.
> In order to prevent this from happening, we could implement our own 
> heartbeats which are periodically sent from the {{TaskManagers}} to the 
> {{JobManager}} and vice versa, for example. Not receiving a certain number of 
> heartbeats could then mean that the {{TaskManager}} is marked as dead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to