Pavel Moravec created QPID-4082:
-----------------------------------
Summary: cluster de-sync after broker restart & queue replication
Key: QPID-4082
URL: https://issues.apache.org/jira/browse/QPID-4082
Project: Qpid
Issue Type: Bug
Components: C++ Clustering
Affects Versions: 0.16
Reporter: Pavel Moravec
Assignee: Alan Conway
Priority: Minor
Description of problem:
Having queue state replication between 2 clusters, restarting _a_ broker in
both source+destination clusters sometimes leads to cluster de-sync. No QMF
communication is involved, though symptoms are similar to the bug caused by
missing propagation of QMF errors within a cluster.
Version-Release number of selected component (if applicable):
spotted in qpid 0.14, expected also in 0.16
How reproducible:
100% within 10 minutes.
Steps to Reproduce:
1. Have 2node src. cluster and 2node dst cluster (see reproducer for example
config and also for a reproducer script for further steps).
2. Have a queue state replication between the clusters.
3. Randomly stop or start a broker in a cluster (such that everytime both
clusters have at least 1 node running - i.e. stop+start only non-elder brokers)
4. After each stop or start, send 1 message to the src.broker to a queue to be
replicated.
5. Wait some time
Actual results:
The started-up broker in src.cluster may shutdown after logging:
2012-05-31 11:58:40 critical cluster(10.34.1.218:26715 READY/error) local error
502 did not occur on member 10.34.1.218:26294: invalid-argument:
anonymous.b941dd87-3fa1-442d-99f7-8c0907599b30: confirmed < (24+0) but only
sent < (23+0) (qpid/SessionState.cpp:154)
Expected results:
No such error
Additional info:
- the affected session is always federation route for the queue state
replication
- the stop and start of both one src and one dst broker is essential in the
scenario, e.g. without (re)starting a dst.broker, no error.
- sometimes almost deterministic scenario is:
1) start everything, send a message
2) stop a dst.broker, send a message
3) stop a src.broker, send a message
4) start src.broker, then dst.broker
5) wait some time (i.e. 10 seconds) and send a message
Sometimes I got instantly the error, sometimes never.
Patch to be proposed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]