Mark Payne created NIFI-10052:
---------------------------------

             Summary: Avoid obtaining any locks when creating/sending heartbeats
                 Key: NIFI-10052
                 URL: https://issues.apache.org/jira/browse/NIFI-10052
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
            Reporter: Mark Payne


When NiFi creates a heartbeat to send to the coordinator, it must obtain a few 
locks in order to generate that heartbeat. We should avoid obtaining any read 
locks, write locks, or synchronized monitors, especially those that may be held 
for a while. Doing so can result in NiFi getting disconnected from the cluster 
if a write lock is held for a long time.

Specifically, the following locks are obtained, at minimum:
 * FlowController readLock in the createHeartbeatMessage() method. Due to 
refactoring, this read lock is not necessary at all.
 * revisionManager.getRevisionUpdateCount() is synchronized. However, the 
synchronization here is not needed, as it just returns an AtomicLong.get(). 
This is perhaps the most important lock to avoid because any update to a 
component or group of components happens within revisionManager.updateRevision, 
which also is synchronized. So a large request like deleting thousands of 
components will block heartbeats from being created until this completes.
 * FlowController.getTotalFlowFileCount - this may be the most challenging to 
eliminate. It calls ProcessGroup.getConnections() and 
ProcessGroup.getProcessGroups(), which means that it must obtain the read lock 
of the Process Group twice - for every Process Group in the flow. We may be 
able to change StandardProcessGroup's connections and processGroups maps to 
ConcurrentHashMap's and just introduce a getQueueSize() method on ProcessGroup 
that can avoid having to lock so much
 * This createHeartbeatMessage() method also appears to reference 
FlowController's {{connectionStatus}} member variable without any locks, 
although it is not volatile and documentation indicates that it's guarded by 
read/write lock. So that needs to be addressed in order to ensure that the 
connectionStatus is always accurately reported.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to