-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6396/
-----------------------------------------------------------

Review request for qpid, Alan Conway, Gordon Sim, and Ted Ross.


Description
-------

Occasionally, the cluster tests will fail in the 
test_federation_multilink_failover test with the following error:

cluster_tests.ShortTests.test_federation_multilink_failover 
...................................................................................................................
 fail
Error during test:  Traceback (most recent call last):
    File 
"/home/kgiusti/Desktop/work/qpid/trunk/build/qpid/src/tests/python/commands/qpid-python-test",
 line 340, in run
      phase()
    File 
"/home/kgiusti/Desktop/work/qpid/trunk/qpid/cpp/src/tests/cluster_tests.py", 
line 992, in test_federation_multilink_failover
      assert self._verify_federation(src_cluster[1], "FedX/two", 
dst_cluster[1], "destQ2")
  AssertionError

The problem is due to a race condition in the LinkRegistry code.  When a new 
connection event occurs for a federation Link, the LinkRegistry attempts to 
find a Link instance that is attempting to connect to the remote in order to 
assign the connection.  The problem is due to the fact that the search for the 
target link is done under a lock, but the assignment is done outside of the 
lock (to prevent lock inversion).

The proposed fix has LinkRegistry hold all disconnected Links in a separate 
container, and perform the search of that container (and the removal on match) 
while holding a lock.


This addresses bug QPID-4193.
    https://issues.apache.org/jira/browse/QPID-4193


Diffs
-----

  /trunk/qpid/cpp/src/qpid/broker/LinkRegistry.h 1368984 
  /trunk/qpid/cpp/src/qpid/broker/LinkRegistry.cpp 1368984 
  /trunk/qpid/cpp/src/qpid/cluster/Connection.cpp 1368984 
  /trunk/qpid/cpp/src/qpid/cluster/UpdateClient.cpp 1368984 

Diff: https://reviews.apache.org/r/6396/diff/


Testing
-------

Federation and cluster unit tests.
Ran test_federation_multilink_failover repeatedly with no crash.


Thanks,

Kenneth Giusti

Reply via email to