[jira] [Updated] (NIFI-12232) Frequent failed to connect node to cluster because local flow controller partially updated. Administrator should disconnect node and review flow for corruption

John Joseph (Jira) Mon, 16 Oct 2023 03:46:05 -0700


     [ 
https://issues.apache.org/jira/browse/NIFI-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


John Joseph updated NIFI-12232:
-------------------------------
    Description: 
This is an issue that we have been observing in the 1.23.2 version of NiFi when 
we try upgrade,

Since Rolling upgrade is not supported in NiFi, we scale out the revision that 
is running and {_}run a helm upgrade{_}.

We have NIFI running in k8s cluster mode, there is a post job that call the 
Tenants and policies API. On a successful run it would run like this
{code:java}
set_policies() Action: 'read' Resource: '/flow' entity_id: 
'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
entity_type: 'USER'
set_policies() status: '200'
'read' '/flow' policy already exists. It will be updated...
set_policies() fetching policy inside -eq 200 status: '200'
set_policies() after update PUT: '200'
set_policies() Action: 'read' Resource: '/tenants' entity_id: 
'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
entity_type: 'USER'
set_policies() status: '200'{code}
*_This job was running fine in 1.23.0, 1.22 and other previous versions._* In 
{*}{{1.23.2}}{*}, we are noticing that the job is failing very frequently with 
the error logs;
{code:java}
set_policies() Action: 'read' Resource: '/flow' entity_id: 
'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
entity_type: 'USER'
set_policies() status: '200'
'read' '/flow' policy already exists. It will be updated...
set_policies() fetching policy inside -eq 200 status: '200'
set_policies() after update PUT: '400'
An error occurred getting 'read' '/flow' policy: 'This node is disconnected 
from its configured cluster. The requested change will only be allowed if the 
flag to acknowledge the disconnected node is set.'{code}
{{_*'This node is disconnected from its configured cluster. The requested 
change will only be allowed if the flag to acknowledge the disconnected node is 
set.'*_}}


The job is configured to run only after all the pods are up and running. Though 
the pods are up we see exception is the inside pods
{code:java}
org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
to connect node to cluster because local flow controller partially updated. 
Administrator should disconnect node and review flow for corruption.
at 
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1059)
at 
org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:667)
at 
org.apache.nifi.controller.StandardFlowService.access$200(StandardFlowService.java:107)
at 
org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:396)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: 
org.apache.nifi.controller.serialization.FlowSynchronizationException: 
java.lang.IllegalStateException: Cannot change destination of Connection 
because the current destination is running
at 
org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:448)
at 
org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.sync(VersionedFlowSynchronizer.java:206)
at 
org.apache.nifi.controller.serialization.StandardFlowSynchronizer.sync(StandardFlowSynchronizer.java:42)
at 
org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1530)
at 
org.apache.nifi.persistence.StandardFlowConfigurationDAO.load(StandardFlowConfigurationDAO.java:104)
at 
org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:817)
at 
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1028)
... 4 common frames omitted
Caused by: java.lang.IllegalStateException: Cannot change destination of 
Connection because the current destination is running
at 
org.apache.nifi.connectable.StandardConnection.setDestination(StandardConnection.java:310)
at 
org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.updateConnectionDestinations(StandardVersionedComponentSynchronizer.java:700)
at 
org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.synchronize(StandardVersionedComponentSynchronizer.java:405)
at 
org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.synchronizeChildGroups(StandardVersionedComponentSynchronizer.java:543)
at 
org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.synchronize(StandardVersionedComponentSynchronizer.java:427)
at 
org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.lambda$synchronize$0(StandardVersionedComponentSynchronizer.java:266)
at 
org.apache.nifi.controller.flow.AbstractFlowManager.withParameterContextResolution(AbstractFlowManager.java:550)
at 
org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.synchronize(StandardVersionedComponentSynchronizer.java:261)
at 
org.apache.nifi.groups.StandardProcessGroup.synchronizeFlow(StandardProcessGroup.java:3977)
at 
org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:439)
... 10 common frames omitted{code}
Attaching screenshots of the UI as well. this issue is observed a lot checking 
with CLI command.
{code:java}
./cli.sh nifi cluster-summary -u 
https://nifi-headless.doc-norc.svc.cluster.local:9443 -ts 
/opt/nifi/cert_mgr/truststore.jks -tst jks -tsp changeit -ks 
/opt/nifi/cert_mgr/keystore.j
ks -kst jks -ksp changeit
Total node count: 0
Connected node count: 0
Clustered: true
Connected to cluster: false{code}
 
We tried Workaround
{code:java}
1.Exec to the pod that has the flow file issue, delete the flow file so that it 
deletes from the PVC 
2. Exit from pod
3. Delete the pod that had the problem{code}
Pod will respwan, cluster coordinator will recreate the flowfile from the 
connected nodes
This connected all the nodes. But this does not feel like an ideal solution as 
we're seeing this issue quite often and cannot run this WA every time

!image-2023-10-16-16-12-31-027.png!

  was:
This is an issue that we have been observing in the 1.23.2 version of NiFi when 
we try upgrade,

Since Rolling upgrade is not supported in NiFi, we scale out the revision that 
is running and run a helm upgrade.

We have NIFI running in k8s cluster mode, there is a post job that call the 
Tenants and policies API. On a successful run it would run like this
{code:java}
set_policies() Action: 'read' Resource: '/flow' entity_id: 
'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
entity_type: 'USER'
set_policies() status: '200'
'read' '/flow' policy already exists. It will be updated...
set_policies() fetching policy inside -eq 200 status: '200'
set_policies() after update PUT: '200'
set_policies() Action: 'read' Resource: '/tenants' entity_id: 
'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
entity_type: 'USER'
set_policies() status: '200'{code}

*_This job was running fine in 1.23.0, 1.22 and other previous versions._* In 
{{{}1.23.2{}}}, we are noticing that the job is failing very frequently with 
the error logs;
{code:java}
set_policies() Action: 'read' Resource: '/flow' entity_id: 
'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
entity_type: 'USER'
set_policies() status: '200'
'read' '/flow' policy already exists. It will be updated...
set_policies() fetching policy inside -eq 200 status: '200'
set_policies() after update PUT: '400'
An error occurred getting 'read' '/flow' policy: 'This node is disconnected 
from its configured cluster. The requested change will only be allowed if the 
flag to acknowledge the disconnected node is set.'{code}

{{_*'This node is disconnected from its configured cluster. The requested 
change will only be allowed if the flag to acknowledge the disconnected node is 
set.'*_}}
The job is configured to run only after all the pods are up and running. Though 
the pods are up we see exception is the inside pods
{code:java}
org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
to connect node to cluster because local flow controller partially updated. 
Administrator should disconnect node and review flow for corruption.
at 
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1059)
at 
org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:667)
at 
org.apache.nifi.controller.StandardFlowService.access$200(StandardFlowService.java:107)
at 
org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:396)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: 
org.apache.nifi.controller.serialization.FlowSynchronizationException: 
java.lang.IllegalStateException: Cannot change destination of Connection 
because the current destination is running
at 
org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:448)
at 
org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.sync(VersionedFlowSynchronizer.java:206)
at 
org.apache.nifi.controller.serialization.StandardFlowSynchronizer.sync(StandardFlowSynchronizer.java:42)
at 
org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1530)
at 
org.apache.nifi.persistence.StandardFlowConfigurationDAO.load(StandardFlowConfigurationDAO.java:104)
at 
org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:817)
at 
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1028)
... 4 common frames omitted
Caused by: java.lang.IllegalStateException: Cannot change destination of 
Connection because the current destination is running
at 
org.apache.nifi.connectable.StandardConnection.setDestination(StandardConnection.java:310)
at 
org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.updateConnectionDestinations(StandardVersionedComponentSynchronizer.java:700)
at 
org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.synchronize(StandardVersionedComponentSynchronizer.java:405)
at 
org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.synchronizeChildGroups(StandardVersionedComponentSynchronizer.java:543)
at 
org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.synchronize(StandardVersionedComponentSynchronizer.java:427)
at 
org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.lambda$synchronize$0(StandardVersionedComponentSynchronizer.java:266)
at 
org.apache.nifi.controller.flow.AbstractFlowManager.withParameterContextResolution(AbstractFlowManager.java:550)
at 
org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.synchronize(StandardVersionedComponentSynchronizer.java:261)
at 
org.apache.nifi.groups.StandardProcessGroup.synchronizeFlow(StandardProcessGroup.java:3977)
at 
org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:439)
... 10 common frames omitted{code}

Attaching screenshots of the UI as well. this issue is observed a lot checking 
with CLI command.
{code:java}
./cli.sh nifi cluster-summary -u 
https://nifi-headless.doc-norc.svc.cluster.local:9443 -ts 
/opt/nifi/cert_mgr/truststore.jks -tst jks -tsp changeit -ks 
/opt/nifi/cert_mgr/keystore.j
ks -kst jks -ksp changeit
Total node count: 0
Connected node count: 0
Clustered: true
Connected to cluster: false{code}

 
We tried Workaround
{code:java}
1.Exec to the pod that has the flow file issue, delete the flow file so that it 
deletes from the PVC 
2. Exit from pod
3. Delete the pod that had the problem{code}
Pod will respwan, cluster coordinator will recreate the flowfile from the 
connected nodes
This connected all the nodes. But this does not feel like an ideal solution as 
we're seeing this issue quite often and cannot run this WA every time


!image-2023-10-16-16-12-31-027.png!


> Frequent failed to connect node to cluster because local flow controller 
> partially updated. Administrator should disconnect node and review flow for 
> corruption
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-12232
>                 URL: https://issues.apache.org/jira/browse/NIFI-12232
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Configuration Management
>    Affects Versions: 1.23.2
>            Reporter: John Joseph
>            Priority: Major
>         Attachments: image-2023-10-16-16-12-31-027.png
>
>
> This is an issue that we have been observing in the 1.23.2 version of NiFi 
> when we try upgrade,
> Since Rolling upgrade is not supported in NiFi, we scale out the revision 
> that is running and {_}run a helm upgrade{_}.
> We have NIFI running in k8s cluster mode, there is a post job that call the 
> Tenants and policies API. On a successful run it would run like this
> {code:java}
> set_policies() Action: 'read' Resource: '/flow' entity_id: 
> 'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
> entity_type: 'USER'
> set_policies() status: '200'
> 'read' '/flow' policy already exists. It will be updated...
> set_policies() fetching policy inside -eq 200 status: '200'
> set_policies() after update PUT: '200'
> set_policies() Action: 'read' Resource: '/tenants' entity_id: 
> 'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
> entity_type: 'USER'
> set_policies() status: '200'{code}
> *_This job was running fine in 1.23.0, 1.22 and other previous versions._* In 
> {*}{{1.23.2}}{*}, we are noticing that the job is failing very frequently 
> with the error logs;
> {code:java}
> set_policies() Action: 'read' Resource: '/flow' entity_id: 
> 'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
> entity_type: 'USER'
> set_policies() status: '200'
> 'read' '/flow' policy already exists. It will be updated...
> set_policies() fetching policy inside -eq 200 status: '200'
> set_policies() after update PUT: '400'
> An error occurred getting 'read' '/flow' policy: 'This node is disconnected 
> from its configured cluster. The requested change will only be allowed if the 
> flag to acknowledge the disconnected node is set.'{code}
> {{_*'This node is disconnected from its configured cluster. The requested 
> change will only be allowed if the flag to acknowledge the disconnected node 
> is set.'*_}}
> The job is configured to run only after all the pods are up and running. 
> Though the pods are up we see exception is the inside pods
> {code:java}
> org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
> to connect node to cluster because local flow controller partially updated. 
> Administrator should disconnect node and review flow for corruption.
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1059)
> at 
> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:667)
> at 
> org.apache.nifi.controller.StandardFlowService.access$200(StandardFlowService.java:107)
> at 
> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:396)
> at java.base/java.lang.Thread.run(Thread.java:833)
> Caused by: 
> org.apache.nifi.controller.serialization.FlowSynchronizationException: 
> java.lang.IllegalStateException: Cannot change destination of Connection 
> because the current destination is running
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:448)
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.sync(VersionedFlowSynchronizer.java:206)
> at 
> org.apache.nifi.controller.serialization.StandardFlowSynchronizer.sync(StandardFlowSynchronizer.java:42)
> at 
> org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1530)
> at 
> org.apache.nifi.persistence.StandardFlowConfigurationDAO.load(StandardFlowConfigurationDAO.java:104)
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:817)
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1028)
> ... 4 common frames omitted
> Caused by: java.lang.IllegalStateException: Cannot change destination of 
> Connection because the current destination is running
> at 
> org.apache.nifi.connectable.StandardConnection.setDestination(StandardConnection.java:310)
> at 
> org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.updateConnectionDestinations(StandardVersionedComponentSynchronizer.java:700)
> at 
> org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.synchronize(StandardVersionedComponentSynchronizer.java:405)
> at 
> org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.synchronizeChildGroups(StandardVersionedComponentSynchronizer.java:543)
> at 
> org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.synchronize(StandardVersionedComponentSynchronizer.java:427)
> at 
> org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.lambda$synchronize$0(StandardVersionedComponentSynchronizer.java:266)
> at 
> org.apache.nifi.controller.flow.AbstractFlowManager.withParameterContextResolution(AbstractFlowManager.java:550)
> at 
> org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.synchronize(StandardVersionedComponentSynchronizer.java:261)
> at 
> org.apache.nifi.groups.StandardProcessGroup.synchronizeFlow(StandardProcessGroup.java:3977)
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:439)
> ... 10 common frames omitted{code}
> Attaching screenshots of the UI as well. this issue is observed a lot 
> checking with CLI command.
> {code:java}
> ./cli.sh nifi cluster-summary -u 
> https://nifi-headless.doc-norc.svc.cluster.local:9443 -ts 
> /opt/nifi/cert_mgr/truststore.jks -tst jks -tsp changeit -ks 
> /opt/nifi/cert_mgr/keystore.j
> ks -kst jks -ksp changeit
> Total node count: 0
> Connected node count: 0
> Clustered: true
> Connected to cluster: false{code}
>  
> We tried Workaround
> {code:java}
> 1.Exec to the pod that has the flow file issue, delete the flow file so that 
> it deletes from the PVC 
> 2. Exit from pod
> 3. Delete the pod that had the problem{code}
> Pod will respwan, cluster coordinator will recreate the flowfile from the 
> connected nodes
> This connected all the nodes. But this does not feel like an ideal solution 
> as we're seeing this issue quite often and cannot run this WA every time
> !image-2023-10-16-16-12-31-027.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (NIFI-12232) Frequent failed to connect node to cluster because local flow controller partially updated. Administrator should disconnect node and review flow for corruption

Reply via email to