Alan Conway created QPID-4402:
---------------------------------
Summary: HA QMF events can be out of order.
Key: QPID-4402
URL: https://issues.apache.org/jira/browse/QPID-4402
Project: Qpid
Issue Type: Bug
Components: C++ Clustering
Affects Versions: 0.18
Reporter: Alan Conway
Assignee: Alan Conway
With the new replication-based clustering in 0.18 MRG-M, it is possible for the
replication to hang if the QMF events arrive in the wrong order. I am running
the following test that generates the hanging:
- Start a client with 2 threads
- Each thread creates its own Connection, Session, and a Receiver using the
address "someQueue; {create:always, node: {x-declare: {auto-delete:True}}}"
- Run a loop like this (pseudocode):
while(receiver.get(message)) {
// do stuff
if at least 5 seconds have passed {
connection.close();
reconnectAndRecreateReceiver();
receiver.setCapacity(1000);
}
}
During this loop, the 2 threads will disconnect and reconnect every 5 seconds.
When connecting, 1 of them will create a queue. When disconnecting, the queue
will be deleted. At some point, the queue creation event will possibly arrive
at the backup broker before the queue deletion event (i.e. in the wrong order)
because there is no lock that governs when queue creation/deletion events are
emitted. When this happens, the backup broker doesn't subscribe to the primary
to replicate the queue in question, and things hang.
This is not strictly a HA problem, any QMF client may receive incorrectly
ordered events. It comes up in the HA context because QMF events are used
heavily by HA for replication.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]