Xinyu Wang created NIFI-15906:
---------------------------------

             Summary: Cluster reconnect inheritance throws 
IllegalStateException when a connection's destination is a running 
RemoteGroupPort whose versionedComponentId is null
                 Key: NIFI-15906
                 URL: https://issues.apache.org/jira/browse/NIFI-15906
             Project: Apache NiFi
          Issue Type: Bug
          Components: Core Framework
    Affects Versions: 2.5.0
            Reporter: Xinyu Wang


*Symptom*

  When a cluster node performs reconnect inheritance and the local flow 
contains a Connection whose destination is a RemoteGroupPort (RGP) with 
transmission=ON, the synchronizer aborts with:

  ERROR [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Handling 
reconnection request failed
  org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
to connect node to cluster because local flow controller partially updated.
  Administrator should disconnect node and review flow for corruption.
      at 
o.a.n.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:947)
      ...
  Caused by: java.lang.IllegalStateException: Cannot change destination of 
Connection because the current destination
  ([RemoteGroupPort[name=TARGET_PORT,targets=https://nifi-1:8443]]) is running
      at 
o.a.n.connectable.StandardConnection.setDestination(StandardConnection.java:296)
      at 
o.a.n.flow.synchronization.StandardVersionedComponentSynchronizer.updateConnectionDestinations(StandardVersionedComponentSynchronizer.java:863)
      at 
o.a.n.flow.synchronization.StandardVersionedComponentSynchronizer.synchronize(StandardVersionedComponentSynchronizer.java:573)
      ...

  The node is then marked DISCONNECTED, Disconnect Code = Node's Flow did not 
Match Cluster Flow and requires manual intervention to rejoin.

*Steps to Reproduce*

  Minimal repro on a fresh two-node 2.5.0 cluster:

  1. Build the following minimal flow on node-1 via REST API:
    - An InputPort with allowRemoteAccess=true (the "target port")
    - A Funnel connected from the InputPort (so the InputPort can be started)
    - A RemoteProcessGroup whose targetUris points back at the same cluster 
(https://node-1:8443) — i.e. a loopback RPG
    - Wait for the RPG to discover the target port via S2S handshake (creates a 
local RemoteGroupPort instance with versionedComponentId = null)
    - Add a GenerateFlowFile processor with a connection whose destination is 
the discovered RemoteGroupPort
    - Enable RemoteProcessGroup transmission (RGP becomes RUNNING)
  2. Disconnect node-2: PUT /controller/cluster/nodes/\{id} with status: 
DISCONNECTING. Wait until DISCONNECTED.
  3. Immediately reconnect node-2: PUT /controller/cluster/nodes/\{id} with 
status: CONNECTING.

  Note: The bug fires on any reconnect that triggers inheritance, as long as 
the local RGP has versionedComponentId = null and is RUNNING.

{*}Expected{*}: node-2 reconnects to CONNECTED.

{*}Actual{*}: node-2 logs IllegalStateException: Cannot change destination of 
Connection ... within ~250 ms of the reconnect, transitions to DISCONNECTED 
with Node's Flow did not Match Cluster Flow, and stays disconnected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to