Mark Payne created NIFI-7866:
--------------------------------

             Summary: When cluster coordinator dies, other nodes may have 
trouble rejoining cluster
                 Key: NIFI-7866
                 URL: https://issues.apache.org/jira/browse/NIFI-7866
             Project: Apache NiFi
          Issue Type: Bug
          Components: Core Framework
            Reporter: Mark Payne


When the cluster coordinator is lost, the nodes must now begin communicating 
with a newly elected Cluster Coordinator. This is handled through the 
StandardFlowService.

When the `handleReconnectionRequest` method is called and the request provided 
does not contain the dataflow, the node is to connect to the cluster 
coordinator and request the dataflow:
{code:java}
private void handleReconnectionRequest(final ReconnectionRequestMessage 
request) {
    try {
        logger.info("Processing reconnection request from cluster 
coordinator.");

        // reconnect
        ConnectionResponse connectionResponse = new 
ConnectionResponse(getNodeId(), request.getDataFlow(),
                request.getInstanceId(), request.getNodeConnectionStatuses(), 
request.getComponentRevisions());

        if (connectionResponse.getDataFlow() == null) {
            logger.info("Received a Reconnection Request that contained no 
DataFlow. Will attempt to connect to cluster using local flow.");
            connectionResponse = connect(false, false, 
createDataFlowFromController());
        }

        loadFromConnectionResponse(connectionResponse);

... {code}
However, if the call above to `connect(false, false, 
createDataFlowFromController()` returns false (which is a valid case), that 
null value is passed along to the loadFromConnectionResponse. This method 
expects a non-null connectionResponse and throws a NullPointerException, 
resulting in the following stack trace (stack trace based on nifi 1.11.4):
{code:java}
2020-09-29 10:18:53,324 ERROR [Reconnect to Cluster] 
o.a.nifi.controller.StandardFlowService Handling reconnection request failed 
due to: org.apache.nifi.cluster.ConnectionException: Failed to connect node to 
cluster due to: 
java.lang.NullPointerExceptionorg.apache.nifi.cluster.ConnectionException: 
Failed to connect node to cluster due to: java.lang.NullPointerExceptionat 
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1035)at
 
org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:668)at
 
org.apache.nifi.controller.StandardFlowService.access$200(StandardFlowService.java:109)at
 
org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:415)at
 java.lang.Thread.run(Thread.java:748)Caused by: 
java.lang.NullPointerException: nullat 
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:989)...
 4 common frames omitted {code}
This results in the node not reconnecting to the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to