[ 
https://issues.apache.org/jira/browse/FLINK-8462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333996#comment-16333996
 ] 

ASF GitHub Bot commented on FLINK-8462:
---------------------------------------

Github user GJL commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5318#discussion_r162865803
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java
 ---
    @@ -717,15 +717,14 @@ private void notifyOfNewResourceManagerLeader(String 
newLeaderAddress, ResourceM
                        }
     
                        // drop the current connection or connection attempt
    -                   if (resourceManagerConnection != null) {
    -                           resourceManagerConnection.close();
    -                           resourceManagerConnection = null;
    -                   }
    +                   closeResourceManagerConnection(
    +                           new FlinkException("New ResourceManager leader 
found under: " + newLeaderAddress +
    +                                   '(' + newResourceManagerId + ')'));
                }
     
                // establish a connection to the new leader
                if (newLeaderAddress != null) {
    -                   log.info("Attempting to register at ResourceManager 
{}", newLeaderAddress);
    +                   log.info("Attempting to register at ResourceManager 
{}({})", newLeaderAddress, newResourceManagerId);
    --- End diff --
    
    nit: a space after the logged `newLeaderAddress` wouldn't hurt: `{} ({})`


> TaskExecutor does not verify RM heartbeat timeouts
> --------------------------------------------------
>
>                 Key: FLINK-8462
>                 URL: https://issues.apache.org/jira/browse/FLINK-8462
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination
>    Affects Versions: 1.5.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Major
>              Labels: flip-6
>             Fix For: 1.5.0
>
>
> The {{TaskExecutor}} does neither properly stop RM heartbeats nor does it 
> check whether a RM heartbeat timeout is still valid. As a consequence, it can 
> happen that the {{TaskExecutor}} closes the connection to an active {{RM}} 
> due to an outdated heartbeat timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to