[ 
https://issues.apache.org/jira/browse/FLINK-17075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17086706#comment-17086706
 ] 

Zhu Zhu commented on FLINK-17075:
---------------------------------

Thanks for the explanation! I think the heartbeat way could work well as a 
safety net.
Agreed that it's best to have both of the approaches. With the heartbeat 
payload safety net, we only need finite state update retries which is simpler 
than infinite retries.

> Add task status reconciliation between TM and JM
> ------------------------------------------------
>
>                 Key: FLINK-17075
>                 URL: https://issues.apache.org/jira/browse/FLINK-17075
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.10.0, 1.11.0
>            Reporter: Till Rohrmann
>            Priority: Critical
>             Fix For: 1.11.0
>
>
> In order to harden the TM and JM communication I suggest to let the 
> {{TaskExecutor}} send the task statuses back to the {{JobMaster}} as part of 
> the heartbeat payload (similar to FLINK-11059). This would allow to reconcile 
> the states of both components in case that a status update message was lost 
> as described by a user on the ML.
> https://lists.apache.org/thread.html/ra9ed70866381f0ef0f4779633346722ccab3dc0d6dbacce04080b74e%40%3Cuser.flink.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to