Mark Payne created NIFI-12221:
---------------------------------

             Summary: Make heartbeat responses more lenient in some cases
                 Key: NIFI-12221
                 URL: https://issues.apache.org/jira/browse/NIFI-12221
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
            Reporter: Mark Payne
            Assignee: Mark Payne
             Fix For: 2.latest


When a heartbeat is received by the Cluster Coordinator, it responds based on 
the node's current connection state. In the case of a disconnected node, it 
either notifies the node that it is disconnected so that it will stop hearting, 
or it requests the node to reconnect to the cluster.

Due to changes that were made in 1.16, as well as a few additional changes that 
have been made since, we can be much more lenient about when we ask the node to 
reconnect vs. disconnect. For example, if a node was disconnected due to not 
handling an update request, we previously needed to request that the node 
disconnect again. However, now we can ask the node to reconnect, as it may well 
be able to reconcile any differences and rejoin.

We even currently request that a node disconnect if receiving a heartbeat from 
a node whose last state was "Disconnected because Node was Shutdown". We should 
definitely be more lenient in this case, as it's occasionally causing System 
Test failures (e.g., 
[https://github.com/apache/nifi/actions/runs/6498488206).|https://github.com/apache/nifi/actions/runs/6498488206)]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to