> On Aug. 7, 2012, 4:51 p.m., Alan Conway wrote:
> > Where is "the assignment is done outside of the lock" in the old code? The 
> > assignment in notifyConnection seems to have the same locking in both old 
> > and new code.
> 
> Kenneth Giusti wrote:
>     You're correct, the call to link::established() still happens outside the 
> lock (it has to, to avoid lock inversion).
>     
>     What the patch does is prevent simultainous calls into 
> LinkRegistry::notifyConnection (from parallel threads) from being able to 
> mistakenly select the _same_ pending Link.  It does this by holding all 
> "pending" Links in a container, and removing them - while locked - as they 
> are selected.  Notice the call to pendingLinks.erase() while the lock is held.
>     
>     This prevents two threads from selecting the same pending Link.

So it is possible for multiple connections to match the same link? In which 
case, how do you know you've selected the correct one?


- Alan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6396/#review9965
-----------------------------------------------------------


On Aug. 6, 2012, 5:51 p.m., Kenneth Giusti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6396/
> -----------------------------------------------------------
> 
> (Updated Aug. 6, 2012, 5:51 p.m.)
> 
> 
> Review request for qpid, Alan Conway, Gordon Sim, and Ted Ross.
> 
> 
> Description
> -------
> 
> Occasionally, the cluster tests will fail in the 
> test_federation_multilink_failover test with the following error:
> 
> cluster_tests.ShortTests.test_federation_multilink_failover 
> ...................................................................................................................
>  fail
> Error during test:  Traceback (most recent call last):
>     File 
> "/home/kgiusti/Desktop/work/qpid/trunk/build/qpid/src/tests/python/commands/qpid-python-test",
>  line 340, in run
>       phase()
>     File 
> "/home/kgiusti/Desktop/work/qpid/trunk/qpid/cpp/src/tests/cluster_tests.py", 
> line 992, in test_federation_multilink_failover
>       assert self._verify_federation(src_cluster[1], "FedX/two", 
> dst_cluster[1], "destQ2")
>   AssertionError
> 
> The problem is due to a race condition in the LinkRegistry code.  When a new 
> connection event occurs for a federation Link, the LinkRegistry attempts to 
> find a Link instance that is attempting to connect to the remote in order to 
> assign the connection.  The problem is due to the fact that the search for 
> the target link is done under a lock, but the assignment is done outside of 
> the lock (to prevent lock inversion).
> 
> The proposed fix has LinkRegistry hold all disconnected Links in a separate 
> container, and perform the search of that container (and the removal on 
> match) while holding a lock.
> 
> 
> This addresses bug QPID-4193.
>     https://issues.apache.org/jira/browse/QPID-4193
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/LinkRegistry.h 1368984 
>   /trunk/qpid/cpp/src/qpid/broker/LinkRegistry.cpp 1368984 
>   /trunk/qpid/cpp/src/qpid/cluster/Connection.cpp 1368984 
>   /trunk/qpid/cpp/src/qpid/cluster/UpdateClient.cpp 1368984 
> 
> Diff: https://reviews.apache.org/r/6396/diff/
> 
> 
> Testing
> -------
> 
> Federation and cluster unit tests.
> Ran test_federation_multilink_failover repeatedly with no crash.
> 
> 
> Thanks,
> 
> Kenneth Giusti
> 
>

Reply via email to