[ https://issues.apache.org/jira/browse/QPID-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643051#comment-13643051 ]
Alan Conway edited comment on QPID-4780 at 4/26/13 5:29 PM: ------------------------------------------------------------ Unable to reproduce, but found a lock ordering deadlock by inspection of the code that would lead to the stack trace given: In one thread: - Link::ioThreadProcessing takes Link:lock then calls - QueueReplicator::initializeBridge tries to lock QueueReplicator::lock. Concurrently in another thread - QueueReplicator::destroy takes QueueReplicator::lock then calls - Bridge::destroy which tries to lock the Link::lock This patch removes the locking around destroyBridge ------------------------------------------------------------------------ r1476305 | aconway | 2013-04-26 13:28:26 -0400 (Fri, 26 Apr 2013) | 9 lines QPID-4780: Bug 889552 - HA broker deadlock after loss of primary broker. Lock ordering deadlock found by inspection of code and stack trace: - thread 1: Link::ioThreadProcessing(Link:lock)-> QueueReplicator::initializeBridge(QueueReplicator::lock) - thread 2: QueueReplicator::destroy(QueueReplicator::lock)-> Bridge::destroy(Link::lock) This patch breaks the lock by removing locking around Bridge::destroy in QueueReplicator::destroy. Committed to trunk ------------------------------------------------------------------------ was (Author: aconway): Unable to reproduce, but found a lock ordering deadlock by inspection of the code that would lead to the stack trace given: In one thread: - Link::ioThreadProcessing takes Link:lock then calls - QueueReplicator::initializeBridge tries to lock QueueReplicator::lock. Concurrently in another thread - QueueReplicator::destroy takes QueueReplicator::lock then calls - Bridge::destroy which tries to lock the Link::lock This patch removes the locking around destroyBridge ------------------------------------------------------------------------ r1476305 | aconway | 2013-04-26 13:28:26 -0400 (Fri, 26 Apr 2013) | 9 lines QPID-4780: Bug 889552 - HA broker deadlock after loss of primary broker. Lock ordering deadlock found by inspection of code and stack trace: - thread 1: Link::ioThreadProcessing(Link:lock)-> QueueReplicator::initializeBridge(QueueReplicator::lock) - thread 2: QueueReplicator::destroy(QueueReplicator::lock)-> Bridge::destroy(Link::lock) This patch breaks the lock by removing locking around Bridge::destroy in QueueReplicator::destroy. ------------------------------------------------------------------------ > HA broker deadlock after loss of primary broker > ------------------------------------------------ > > Key: QPID-4780 > URL: https://issues.apache.org/jira/browse/QPID-4780 > Project: Qpid > Issue Type: Bug > Components: C++ Clustering > Affects Versions: 0.20 > Reporter: Alan Conway > Assignee: Alan Conway > > Description of problem: > While fencing nodes in a cluster, occasionally encounter an issue where a > previously backup broker becomes deadlocked while deleting auto-delete > queues. Only noticed the issue because 'qpid-ha promote' hangs attempting to > promote a backup to primary. > Version-Release number of selected component (if applicable): > Qpid 0.18 > How reproducible: > Rare (race condition) > see also: https://bugzilla.redhat.com/show_bug.cgi?id=889552 > Steps to Reproduce: > 1. Start HA-enabled brokers > 2. Create tens-of-thousands of auto-delete queues > 3. Fence / power-cycle the node hosting the primary broker > > Actual results: > Occasionally the backup broker deadlocks > Expected results: > The backup broker does not deadlock > Additional info: -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org