[ 
https://issues.apache.org/jira/browse/NIFI-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833713#comment-17833713
 ] 

Joe Witt commented on NIFI-12969:
---------------------------------

thanks [~pgyori].  Will flag this as a key issue to look into for 1.26 and 
2.0m3.  If triaged and found to be manageable we can relax it but otherwise it 
will get some attention before we release.

> Under heavy load, nifi node unable to rejoin cluster, graph modified with 
> temp funnel
> -------------------------------------------------------------------------------------
>
>                 Key: NIFI-12969
>                 URL: https://issues.apache.org/jira/browse/NIFI-12969
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 1.24.0, 2.0.0-M2
>            Reporter: Nissim Shiman
>            Assignee: Nissim Shiman
>            Priority: Critical
>             Fix For: 2.0.0-M3, 1.26.0
>
>         Attachments: nifi-app.log, simple_flow.png, 
> simple_flow_with_temp-funnel.png
>
>
> Under heavy load, if a node leaves the cluster (due to heartbeat time out), 
> many times it is unable to rejoin the cluster.
> The nodes' graph will have been modified with a temp-funnel as well.
> Appears to be some sort of [timing 
> issue|https://github.com/apache/nifi/blob/rel/nifi-2.0.0-M2/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/connectable/StandardConnection.java#L298]
>  # To reproduce, on a nifi cluster of three nodes, set up:
> 2 GenerateFlowFile processors -> PG
> Inside PG:
> inputPort -> UpdateAttribute
>  # Keep all defaults except for the following:
> For UpdateAttribute terminate the success relationship
> One of the GenerateFlowFile processors can be disabled,
> the other one should have Run Schedule to be 0 min (this will allow for the 
> heavy load)
>  # In nifi.properties (on all 3 nodes) to allow for nodes to fall out of the 
> cluster, set: nifi.cluster.protocol.heartbeat.interval=2 sec  (default is 5) 
> nifi.cluster.protocol.heartbeat.missable.max=1   (default is 8)
> Restart nifi. Start flow. The nodes will quickly fall out and rejoin cluster. 
> After a few minutes one will likely not be able to rejoin.  The graph for 
> that node will have the disabled GenerateFlowFile now pointing to a funnel (a 
> temp-funnel) instead of the PG
> Stack trace on that nodes nifi-app.log will look like this: (this is from 
> 2.0.0-M2):
> {code:java}
> 2024-03-28 13:55:19,395 INFO [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Node disconnected due to Failed to 
> properly handle Reconnection request due to org.apache.nifi.control
> ler.serialization.FlowSynchronizationException: Failed to connect node to 
> cluster because local flow controller partially updated. Administrator should 
> disconnect node and review flow for corrup
> tion.
> 2024-03-28 13:55:19,395 ERROR [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Handling reconnection request failed 
> due to: org.apache.nifi.controller.serialization.FlowSynchroniza
> tionException: Failed to connect node to cluster because local flow 
> controller partially updated. Administrator should disconnect node and review 
> flow for corruption.
> org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
> to connect node to cluster because local flow controller partially updated. 
> Administrator should disconnect node and
>  review flow for corruption.
>         at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:985)
>         at 
> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:655)
>         at 
> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:384)
>         at java.base/java.lang.Thread.run(Thread.java:1583)
> Caused by: 
> org.apache.nifi.controller.serialization.FlowSynchronizationException: 
> java.lang.IllegalStateException: Cannot change destination of Connection 
> because FlowFiles from this Connection
> are currently held by LocalPort[id=99213c00-78ca-4848-112f-5454cc20656b, 
> type=INPUT_PORT, name=inputPort, group=innerPG]
>         at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:472)
>         at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.sync(VersionedFlowSynchronizer.java:223)
>         at 
> org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1740)
>         at 
> org.apache.nifi.persistence.StandardFlowConfigurationDAO.load(StandardFlowConfigurationDAO.java:91)
>         at 
> org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:805)
>         at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:954)
>         ... 3 common frames omitted
> Caused by: java.lang.IllegalStateException: Cannot change destination of 
> Connection because FlowFiles from this Connection are currently held by 
> LocalPort[id=99213c00-78ca-4848-112f-5454cc20656b
> , type=INPUT_PORT, name=inputPort, group=innerPG]
>         at 
> org.apache.nifi.connectable.StandardConnection.setDestination(StandardConnection.java:299)
>         at 
> org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.updateConnectionDestinations(StandardVersionedComponentSynchronizer.java:705)
>         at 
> org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.synchronize(StandardVersionedComponentSynchronizer.java:423)
>         at 
> org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.lambda$synchronize$0(StandardVersionedComponentSynchronizer.java:248)
>         at 
> org.apache.nifi.controller.flow.AbstractFlowManager.withParameterContextResolution(AbstractFlowManager.java:638)
>         at 
> org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.synchronize(StandardVersionedComponentSynchronizer.java:243)
>         at 
> org.apache.nifi.groups.StandardProcessGroup.synchronizeFlow(StandardProcessGroup.java:3860)
>         at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:464)
>         ... 8 common frames omitted
> 2024-03-28 13:55:19,395 INFO [Reconnect to Cluster] 
> o.a.n.c.c.node.NodeClusterCoordinator machine-name-2.organization.org:8443 
> requested disconnection from cluster due to org.apache.nifi.c
> ontroller.serialization.FlowSynchronizationException: Failed to connect node 
> to cluster because local flow controller partially updated. Administrator 
> should disconnect node and review flow for
> corruption.
> 2024-03-28 13:55:19,395 INFO [Reconnect to Cluster] 
> o.a.n.c.c.node.NodeClusterCoordinator Status of 
> <machine-name-2.organization>.org:8443 changed from 
> NodeConnectionStatus[nodeId=<machine-name-
> 2.organization>.org:8443, state=CONNECTING, updateId=852] to 
> NodeConnectionStatus[nodeId=<machine-name-2.organization>.org:8443, 
> state=DISCONNECTED, Disconnect Code=Node's Flow did n
> ot Match Cluster Flow, Disconnect 
> Reason=org.apache.nifi.controller.serialization.FlowSynchronizationException: 
> Failed to connect node to cluster because local flow controller partially 
> updated.
>  Administrator should disconnect node and review flow for corruption., 
> updateId=854]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to