[
https://issues.apache.org/jira/browse/FLINK-8462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333998#comment-16333998
]
ASF GitHub Bot commented on FLINK-8462:
---------------------------------------
Github user GJL commented on a diff in the pull request:
https://github.com/apache/flink/pull/5318#discussion_r162867279
--- Diff:
flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java
---
@@ -1337,11 +1340,16 @@ public void reportPayload(ResourceID resourceID,
Void payload) {
@Override
public void notifyHeartbeatTimeout(final ResourceID resourceId)
{
runAsync(() -> {
- log.info("The heartbeat of ResourceManager with
id {} timed out.", resourceId);
+ // first check whether the timeout is still
valid
+ if (resourceManagerConnection != null &&
resourceManagerConnection.getResourceManagerId().equals(resourceId)) {
+ log.info("The heartbeat of
ResourceManager with id {} timed out.", resourceId);
- closeResourceManagerConnection(
- new TimeoutException(
- "The heartbeat of
ResourceManager with id " + resourceId + " timed out."));
+ closeResourceManagerConnection(
+ new TimeoutException(
+ "The heartbeat of
ResourceManager with id " + resourceId + " timed out."));
+ } else {
+ log.debug("Received heartbeat timeout
for outdated ResourceManager connection {}. Ignoring the timeout.", resourceId);
--- End diff --
nit: *ResourceManager with id* vs *ResourceManager connection {}*
Same argument is logged but one is called a connection, the other one is
called RM.
> TaskExecutor does not verify RM heartbeat timeouts
> --------------------------------------------------
>
> Key: FLINK-8462
> URL: https://issues.apache.org/jira/browse/FLINK-8462
> Project: Flink
> Issue Type: Bug
> Components: Distributed Coordination
> Affects Versions: 1.5.0
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Priority: Major
> Labels: flip-6
> Fix For: 1.5.0
>
>
> The {{TaskExecutor}} does neither properly stop RM heartbeats nor does it
> check whether a RM heartbeat timeout is still valid. As a consequence, it can
> happen that the {{TaskExecutor}} closes the connection to an active {{RM}}
> due to an outdated heartbeat timeout.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)