Hello, We're running a 4-node cluster on NiFi 1.7.1. The fourth node was added recently and as soon as we added the 4th node, we started seeing below warnings
Response time from NODE2 was slow for each of the last 3 requests made. To see more information about timing, enable DEBUG logging for org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator Initially we though the problem was with the recent node added and cross checked all the configs on the box and everything seemed to be just fine. After enabling the DEBUG mode for cluster logging we noticed that the warning is not specific to any node and every-time we see a warning like above there is one slow node which takes forever to send a response like below (in this case the slow node is NIFI04). Sometimes these will lead to node-disconnects needing a manual intervention. DEBUG [Replicate Request Thread-50] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET /nifi-api/site-to-site (Request ID b2c6e983-5233-4007-bd54-13d21b7068d5): NIFI04:8443: 1386 millis NIFI02:8443: 3 millis NIFI01:8443: 5 millis NIFI03:8443: 3 millis DEBUG [Replicate Request Thread-41] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET /nifi-api/site-to-site (Request ID d182fdab-f1d4-4ac9-97fd-e24c41dc4622): NIFI04:8443: 1143 millis NIFI02:8443: 22 millis NIFI01:8443: 3 millis NIFI03:8443: 2 millis DEBUG [Replicate Request Thread-31] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET /nifi-api/site-to-site (Request ID e4726027-27c7-4bbb-8ab6-d02bb41f1920): NIFI04:8443: 1053 millis NIFI02:8443: 3 millis NIFI01:8443: 3 millis NIFI03:8443: 2 millis We tried changing the configurations in nifi.properties like bumping up the "nifi.cluster.node.protocol.max.threads" but none of them seems to be working and we're still stuck with the slow communication between the nodes. We use an external zookeeper as this is our production server. Below are some of our configs # cluster node properties (only configure for cluster nodes) # nifi.cluster.is.node=true nifi.cluster.node.address=fslhdppnifi01.imfs.micron.com nifi.cluster.node.protocol.port=11443 nifi.cluster.node.protocol.threads=100 nifi.cluster.node.protocol.max.threads=120 nifi.cluster.node.event.history.size=25 nifi.cluster.node.connection.timeout=90 sec nifi.cluster.node.read.timeout=90 sec nifi.cluster.node.max.concurrent.requests=1000 nifi.cluster.firewall.file= nifi.cluster.flow.election.max.wait.time=30 sec nifi.cluster.flow.election.max.candidates= Any thoughts on why this is happening? -Karthik