[ 
https://issues.apache.org/jira/browse/FLINK-17075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17084675#comment-17084675
 ] 

Zhu Zhu commented on FLINK-17075:
---------------------------------

Regarding the approach to report task states via heartbeat payloads, my main 
concerns are
1. we may need to deal with the inconsistency of states reported by 
{{updateTaskExecutionState}} and the states reported by heartbeat
2. the TM may need to keep the task states even if the task has terminated. 
Otherwise if the state is missing, the JM can hardly tell whether a task is not 
received yet or had terminated, and what is the exact terminated 
state(CANCELED/FAILED/FINISHED). The states can be cleared only after TM is 
disassociated from that JM/job.

> Add task status reconciliation between TM and JM
> ------------------------------------------------
>
>                 Key: FLINK-17075
>                 URL: https://issues.apache.org/jira/browse/FLINK-17075
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.10.0, 1.11.0
>            Reporter: Till Rohrmann
>            Priority: Critical
>             Fix For: 1.11.0
>
>
> In order to harden the TM and JM communication I suggest to let the 
> {{TaskExecutor}} send the task statuses back to the {{JobMaster}} as part of 
> the heartbeat payload (similar to FLINK-11059). This would allow to reconcile 
> the states of both components in case that a status update message was lost 
> as described by a user on the ML.
> https://lists.apache.org/thread.html/ra9ed70866381f0ef0f4779633346722ccab3dc0d6dbacce04080b74e%40%3Cuser.flink.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to