[
https://issues.apache.org/jira/browse/FLINK-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15650576#comment-15650576
]
Zhijiang Wang commented on FLINK-4364:
--------------------------------------
Hi [~till.rohrmann], for the heartbeat interaction between TM and JM, the
process is almost the same with RM as we discussed before.
There will be another separate {{HeartbeatManagerImpl}} and
{{HeartbeatListener}} in TM used for JM heartbeat.
Also TM will monitor the {{HeartbeatTarget}} when registration at new JM
successfully by HA mechanism.
There are two issues to be confirmed:
1. If TM detects JM as dead by heartbeat timeout, TM should not release all the
tasks and slots which belong to that JM. TM should do nothing when notified of
heartbeat timeout. It will re-register the new JM by HA and offer the related
slots if possible. It is related with JM failure recovery process. If JM
detects TM as dead by heartbeat timeout, it will release all the related slots
with that TM and request from RM again.
2. For payload informations, currently I am not sure which informations need to
be reported by heartbeat. The JM may need {{SlotPool}} to be consistent with
{{SlotOffer}}, and it also concerns about other processes. So I think we can
deliver payload as null in current implementation and just make the monitor
function effect. Later we can expand the payload information as needed.
Do you thinks the above points are feasible? Then I will work on it this week.
> Implement TaskManager side of heartbeat from JobManager
> -------------------------------------------------------
>
> Key: FLINK-4364
> URL: https://issues.apache.org/jira/browse/FLINK-4364
> Project: Flink
> Issue Type: Sub-task
> Components: Cluster Management
> Reporter: Zhijiang Wang
> Assignee: Zhijiang Wang
>
> The {{JobManager}} initiates heartbeat messages via (JobID, JmLeaderID), and
> the {{TaskManager}} will report metrics info for each heartbeat.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)