[ 
https://issues.apache.org/jira/browse/FLINK-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15650576#comment-15650576
 ] 

Zhijiang Wang commented on FLINK-4364:
--------------------------------------

Hi [~till.rohrmann], for the heartbeat interaction between TM and JM, the 
process is almost the same with RM as we discussed before.
There will be another separate {{HeartbeatManagerImpl}} and 
{{HeartbeatListener}} in TM used for JM heartbeat.
Also TM will monitor the {{HeartbeatTarget}} when registration at new JM 
successfully by HA mechanism.

There are two issues to be confirmed:
1. If TM detects JM as dead by heartbeat timeout, TM should not release all the 
tasks and slots which belong to that JM. TM should do nothing when notified of 
heartbeat timeout. It will re-register the new JM by HA and offer the related 
slots if possible. It is related with JM failure recovery process. If JM 
detects TM as dead by heartbeat timeout, it will release all the related slots 
with that TM and request from RM again.
2. For payload informations, currently I am not sure which informations need to 
be reported by heartbeat. The JM may need {{SlotPool}} to be consistent with 
{{SlotOffer}}, and it also concerns about other processes. So I think we can 
deliver payload as null in current implementation and just make the monitor 
function effect. Later we can expand the payload information as needed.

Do you thinks the above points are feasible? Then I will work on it this week.

> Implement TaskManager side of heartbeat from JobManager
> -------------------------------------------------------
>
>                 Key: FLINK-4364
>                 URL: https://issues.apache.org/jira/browse/FLINK-4364
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Cluster Management
>            Reporter: Zhijiang Wang
>            Assignee: Zhijiang Wang
>
> The {{JobManager}} initiates heartbeat messages via (JobID, JmLeaderID), and 
> the {{TaskManager}} will report metrics info for each heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to