-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6396/
-----------------------------------------------------------
Review request for qpid, Alan Conway, Gordon Sim, and Ted Ross.
Description
-------
Occasionally, the cluster tests will fail in the
test_federation_multilink_failover test with the following error:
cluster_tests.ShortTests.test_federation_multilink_failover
...................................................................................................................
fail
Error during test: Traceback (most recent call last):
File
"/home/kgiusti/Desktop/work/qpid/trunk/build/qpid/src/tests/python/commands/qpid-python-test",
line 340, in run
phase()
File
"/home/kgiusti/Desktop/work/qpid/trunk/qpid/cpp/src/tests/cluster_tests.py",
line 992, in test_federation_multilink_failover
assert self._verify_federation(src_cluster[1], "FedX/two",
dst_cluster[1], "destQ2")
AssertionError
The problem is due to a race condition in the LinkRegistry code. When a new
connection event occurs for a federation Link, the LinkRegistry attempts to
find a Link instance that is attempting to connect to the remote in order to
assign the connection. The problem is due to the fact that the search for the
target link is done under a lock, but the assignment is done outside of the
lock (to prevent lock inversion).
The proposed fix has LinkRegistry hold all disconnected Links in a separate
container, and perform the search of that container (and the removal on match)
while holding a lock.
This addresses bug QPID-4193.
https://issues.apache.org/jira/browse/QPID-4193
Diffs
-----
/trunk/qpid/cpp/src/qpid/broker/LinkRegistry.h 1368984
/trunk/qpid/cpp/src/qpid/broker/LinkRegistry.cpp 1368984
/trunk/qpid/cpp/src/qpid/cluster/Connection.cpp 1368984
/trunk/qpid/cpp/src/qpid/cluster/UpdateClient.cpp 1368984
Diff: https://reviews.apache.org/r/6396/diff/
Testing
-------
Federation and cluster unit tests.
Ran test_federation_multilink_failover repeatedly with no crash.
Thanks,
Kenneth Giusti