Github user GJL commented on a diff in the pull request:
https://github.com/apache/flink/pull/5318#discussion_r162867279
--- Diff:
flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java
---
@@ -1337,11 +1340,16 @@ public void reportPayload(ResourceID resourceID,
Void payload) {
@Override
public void notifyHeartbeatTimeout(final ResourceID resourceId)
{
runAsync(() -> {
- log.info("The heartbeat of ResourceManager with
id {} timed out.", resourceId);
+ // first check whether the timeout is still
valid
+ if (resourceManagerConnection != null &&
resourceManagerConnection.getResourceManagerId().equals(resourceId)) {
+ log.info("The heartbeat of
ResourceManager with id {} timed out.", resourceId);
- closeResourceManagerConnection(
- new TimeoutException(
- "The heartbeat of
ResourceManager with id " + resourceId + " timed out."));
+ closeResourceManagerConnection(
+ new TimeoutException(
+ "The heartbeat of
ResourceManager with id " + resourceId + " timed out."));
+ } else {
+ log.debug("Received heartbeat timeout
for outdated ResourceManager connection {}. Ignoring the timeout.", resourceId);
--- End diff --
nit: *ResourceManager with id* vs *ResourceManager connection {}*
Same argument is logged but one is called a connection, the other one is
called RM.
---