[
https://issues.apache.org/jira/browse/NIFI-9217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17414963#comment-17414963
]
Mark Payne commented on NIFI-9217:
----------------------------------
It appears that given this call stack:
ElectionListener.verifyLeader -> CuratorLeaderElectionManager.getLeaderRole
would also cause issues, because #getLeaderRole is also synchronized on
CuratorLeaderElectionManager.
Additionally, CuratorLeaderElectionManager.getLeader calls
CuratorLeaderElectionManager.onLeaderChanged, which is also synchronized on
CuratorLeaderElectionManager.
It's important that we not make a call to a synchronized method in
CuratorLeaderElectionManager.getLeaderRole because it is called while the
synchronization monitor of ElectionListener is held.
#onLeaderChanged and #getLeadershipChangeCount could easily be refactored to
not be synchronized and instead make CuratorLeaderElectionManager.leaderChanges
a ConcurrentHashMap.
Fortunately, it appears that CuratorLeaderElectionManager.leaderRoles hashmap
can easily become a ConcurrentHashMap, which would allow us to eliminate the
synchronized monitor
> Possible deadlock when node is disconnected
> -------------------------------------------
>
> Key: NIFI-9217
> URL: https://issues.apache.org/jira/browse/NIFI-9217
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Reporter: Mark Payne
> Assignee: Paul Grey
> Priority: Critical
>
> When offloading a node, I encountered a deadlock. Grabbing a thread dump
> shows the following two threads are in a deadlock:
> {code}
> "Disconnect from Cluster" Id=167 BLOCKED on
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@1f9bfb51
> ** DEADLOCKED THREAD ** ** MONITOR-DEADLOCKED THREAD **
> at
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener.setLeader(CuratorLeaderElectionManager.java:530)
> at
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener.disable(CuratorLeaderElectionManager.java:497)
> at
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager.unregister(CuratorLeaderElectionManager.java:182)
> - waiting on
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager@7aa4e8dc
> at
> org.apache.nifi.controller.FlowController.onClusterDisconnect(FlowController.java:2456)
> at
> org.apache.nifi.controller.FlowController.setClustered(FlowController.java:2434)
> at
> org.apache.nifi.controller.StandardFlowService.disconnect(StandardFlowService.java:771)
> at
> org.apache.nifi.controller.StandardFlowService.handleDisconnectionRequest(StandardFlowService.java:752)
> at
> org.apache.nifi.controller.StandardFlowService.access$400(StandardFlowService.java:112)
> at
> org.apache.nifi.controller.StandardFlowService$3.run(StandardFlowService.java:425)
> at java.lang.Thread.run(Thread.java:748)
> Number of Locked Synchronizers: 2
> -
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@3d3a1057
> -
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@3d3cb514
> "Process Cluster Protocol Request-3" Id=165 BLOCKED on
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager@7aa4e8dc
> ** DEADLOCKED THREAD ** ** MONITOR-DEADLOCKED THREAD **
> at
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager.registerPollTime(CuratorLeaderElectionManager.java:304)
> at
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager.getLeader(CuratorLeaderElectionManager.java:293)
> at
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener.verifyLeader(CuratorLeaderElectionManager.java:556)
> at
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener.isLeader(CuratorLeaderElectionManager.java:510)
> - waiting on
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@1f9bfb51
> at
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$LeaderRole.isLeader(CuratorLeaderElectionManager.java:451)
> at
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager.isLeader(CuratorLeaderElectionManager.java:261)
> at
> org.apache.nifi.cluster.coordination.node.NodeClusterCoordinator.isActiveClusterCoordinator(NodeClusterCoordinator.java:823)
> at
> org.apache.nifi.cluster.coordination.node.NodeClusterCoordinator.handleNodeStatusChange(NodeClusterCoordinator.java:1168)
> at
> org.apache.nifi.cluster.coordination.node.NodeClusterCoordinator.handle(NodeClusterCoordinator.java:1097)
> at
> org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.dispatchRequest(SocketProtocolListener.java:176)
> at
> org.apache.nifi.io.socket.SocketListener$2$1.run(SocketListener.java:131)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Number of Locked Synchronizers: 1
> - java.util.concurrent.ThreadPoolExecutor$Worker@c4ecdba
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)