[ 
https://issues.apache.org/jira/browse/NIFI-12459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Joseph updated NIFI-12459:
-------------------------------
    Description: 
This issue has been observed in both 1.22 and 1.23.2Nifi is running in cluster 
mode in k8s (3 pods), Embedded zookeeper is enabled. The nodes appear to be 
disconnected, there is a pop up that says.
This node is currently not connected to the cluster. Any modifications to the 
data flow made here will not replicate across the cluster. (PFA)

2 out of 3 nifi pods are having the below message:

[Clustering Tasks Thread-3] o.apache.nifi.controller.FlowController Failed to 
send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: 
Failed to send message to Cluster Coordinator


So we :
executed to pods having the issue (kubectl exec -it <nifi-0> <namespace> bash), 
proceed to opt/nifi/data/conf_directory , 
delete the flow.xml.gz file, 
exit the pod, 
delete the pod using kubectl delete po <nifi-0> <namespace> command .
{_}This was once suggested to us by one of the members here.{_}RESULT:
The pods came back up, but the UI this time had a different error.
The Flow Controller is initializing the Data Flow(PFA).
The logs do not have any exception stack trace but only this.
[main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due 
to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket 
to nifi-2/<unresolved>:11443 due to: java.net.UnknownHostException: nifi-2
The node disconnection has been happening for us here and there (not frequent)
 
!image-2023-12-04-10-11-08-601.png|width=762,height=382!
!image-2023-12-04-10-11-53-220.png!

  was:
This issue has been observed in both 1.22 and 1.23.2Nifi is running in cluster 
mode in k8s (3 pods), Embedded zookeeper is enabled. The nodes appear to be 
disconnected, there is a pop up that says.
This node is currently not connected to the cluster. Any modifications to the 
data flow made here will not replicate across the cluster. (PFA)
So we :
executed to pods having the issue (kubectl exec -it <nifi-0> <namespace> bash), 
proceed to opt/nifi/data/conf_directory , 
delete the flow.xml.gz file, 
exit the pod, 
delete the pod using kubectl delete po <nifi-0> <namespace> command .
{_}This was once suggested to us by one of the members here.{_}RESULT:
The pods came back up, but the UI this time had a different error.
The Flow Controller is initializing the Data Flow(PFA).
The logs do not have any exception stack trace but only this.
[main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due 
to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket 
to nifi-2/<unresolved>:11443 due to: java.net.UnknownHostException: nifi-2
The node disconnection has been happening for us here and there (not frequent)
 
!image-2023-12-04-10-11-08-601.png|width=762,height=382!
!image-2023-12-04-10-11-53-220.png!


> The node disconnection on nifi cluster mode. Failed to connect to cluster due 
> to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create 
> socket to nifi-2/<unresolved>:11443 due to: java.net.UnknownHostException
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-12459
>                 URL: https://issues.apache.org/jira/browse/NIFI-12459
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: NiFi Stateless
>    Affects Versions: 1.22.0, 1.23.2
>            Reporter: John Joseph
>            Priority: Major
>         Attachments: image-2023-12-04-10-11-08-601.png, 
> image-2023-12-04-10-11-53-220.png
>
>
> This issue has been observed in both 1.22 and 1.23.2Nifi is running in 
> cluster mode in k8s (3 pods), Embedded zookeeper is enabled. The nodes appear 
> to be disconnected, there is a pop up that says.
> This node is currently not connected to the cluster. Any modifications to the 
> data flow made here will not replicate across the cluster. (PFA)
> 2 out of 3 nifi pods are having the below message:
> [Clustering Tasks Thread-3] o.apache.nifi.controller.FlowController Failed to 
> send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: 
> Failed to send message to Cluster Coordinator
> So we :
> executed to pods having the issue (kubectl exec -it <nifi-0> <namespace> 
> bash), 
> proceed to opt/nifi/data/conf_directory , 
> delete the flow.xml.gz file, 
> exit the pod, 
> delete the pod using kubectl delete po <nifi-0> <namespace> command .
> {_}This was once suggested to us by one of the members here.{_}RESULT:
> The pods came back up, but the UI this time had a different error.
> The Flow Controller is initializing the Data Flow(PFA).
> The logs do not have any exception stack trace but only this.
> [main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster 
> due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create 
> socket to nifi-2/<unresolved>:11443 due to: java.net.UnknownHostException: 
> nifi-2
> The node disconnection has been happening for us here and there (not frequent)
>  
> !image-2023-12-04-10-11-08-601.png|width=762,height=382!
> !image-2023-12-04-10-11-53-220.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to