Mark, By and large, when you run into issues with timeouts on cluster replication, in my experience, the culprit is usually Garbage Collection. So it may be that you are not thread-limited or CPU-limited, or resource limited at all, just that garbage collection is kicking in at an inopportune time. In such a situation, my suggestion would be to use a nifi.cluster.node.read.timeout of say 30 seconds instead of 10, and to look into how the garbage collection is performing on your system.
I have answered specific questions below, though, in case they are helpful. Thanks -Mark > On Nov 20, 2017, at 3:25 PM, Mark Bean <[email protected]> wrote: > > We are seeing cases where a user attempts to query provenance on a cluster. > One or more Nodes may not respond to the request in a timely manner, and is > then subsequently disconnected from the cluster. The nifi-app.log shows log > messages similar to: > > ThreadPoolRequestReplicator Failed to replicate request POST > /nifi-api/provenance to {host:port} due to > com.sun.jersy.api.client.ClientHandlerException: > java.net.SocketTimeoutException: Read timed out > NodeClusterCoordinator The following nodes failed to process URI > /nifi-api/provenance '{list of one or more nodes}'. Requesting each node > disconnect from cluster. > > We have implemented a custom authorizer. For certain policies, additional > authorization checking is performed. Provenance is one such policy which > performs additional checking. It is surprising that the process is taking > so long as to time out the request. Currently, timeouts are set as: > nifi.cluster.node.read.timeout=10 sec > nifi.cluster.request.replication.claim.timeout=30 sec > > This leads me to believe we are thread-limited, not CPU-limited. > > In this scenario, what threads are involved? Would > nifi.cluster.node.protocol.threads (or .max.threads) be limiting the > processing of such api calls? >>> These are the jetty threads that are involved, on the 'receiving' side and the nifi.cluster.node.protocol.threads on the client side > > Is the api provenance request(s) limited by > nifi.provenance.repository.query.thread? >>> These query threads are background threads that are used to populate the >>> results of the query. Client requests will not block on those results. > > Are there other thread-related properties we should be looking at? > >>> I don't think so. I can't think of any off of the top of my head, anyway. > Are thread properties (such as nifi.provenance.repository.query.threads) > counted against the total threads given by nifi.web.jetty.threads? >>> No, these are separate thread pools. > > Thanks, > Mark
