Hello,

We're running a 4-node cluster on NiFi 1.7.1. The fourth node was added 
recently and as soon as we added the 4th node, we started seeing below warnings

Response time from NODE2 was slow for each of the last 3 requests made. To see 
more information about timing, enable DEBUG logging for 
org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator

Initially we though the problem was with the recent node added and cross 
checked all the configs on the box and everything seemed to be just fine. After 
enabling the DEBUG mode for cluster logging we noticed that the warning is not 
specific to any node and every-time we see a warning like above there is one 
slow node which takes forever to send a response like below (in this case the 
slow node is NIFI04). Sometimes these will lead to node-disconnects needing a 
manual intervention.

DEBUG [Replicate Request Thread-50] o.a.n.c.c.h.r.ThreadPoolRequestReplicator 
Node Responses for GET /nifi-api/site-to-site (Request ID 
b2c6e983-5233-4007-bd54-13d21b7068d5):
NIFI04:8443: 1386 millis
NIFI02:8443: 3 millis
NIFI01:8443: 5 millis
NIFI03:8443: 3 millis
DEBUG [Replicate Request Thread-41] o.a.n.c.c.h.r.ThreadPoolRequestReplicator 
Node Responses for GET /nifi-api/site-to-site (Request ID 
d182fdab-f1d4-4ac9-97fd-e24c41dc4622):
NIFI04:8443: 1143 millis
NIFI02:8443: 22 millis
NIFI01:8443: 3 millis
NIFI03:8443: 2 millis
DEBUG [Replicate Request Thread-31] o.a.n.c.c.h.r.ThreadPoolRequestReplicator 
Node Responses for GET /nifi-api/site-to-site (Request ID 
e4726027-27c7-4bbb-8ab6-d02bb41f1920):
NIFI04:8443: 1053 millis
NIFI02:8443: 3 millis
NIFI01:8443: 3 millis
NIFI03:8443: 2 millis

We tried changing the configurations in nifi.properties like bumping up the 
"nifi.cluster.node.protocol.max.threads" but none of them seems to be working 
and we're still stuck with the slow communication between the nodes. We use an 
external zookeeper as this is our production server.
Below are some of our configs

# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=fslhdppnifi01.imfs.micron.com
nifi.cluster.node.protocol.port=11443
nifi.cluster.node.protocol.threads=100
nifi.cluster.node.protocol.max.threads=120
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=90 sec
nifi.cluster.node.read.timeout=90 sec
nifi.cluster.node.max.concurrent.requests=1000
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=30 sec
nifi.cluster.flow.election.max.candidates=

Any thoughts on why this is happening?


-Karthik

Reply via email to