Alan Conway wrote:
On Wed, 2009-04-15 at 16:39 +0100, Gordon Sim wrote:
Gordon Sim wrote:
I am seeing an automated build hang on the unit tests. It doesn't happen
every time or on every machine (running on RHEL5). I _believe_ it was
first introduced by the changes to SocketProxy in r758852[1] (can't
reproduce the hang before that revision).
I can only reproduce the hang when running under valgrind. From adding
in some debug logging it appears to be due to the proxy connection
failing to be notified of any connection attempt and therefore keeps
waiting to accept.
I.e. the SocketProxy thread spins in the loop starting
SocketProxy.cpp:108. The main thread is meanwhile waiting for the
connection to open in BrokerFixture.cpp:98.
A little more information on this. It appears that introducing a short
sleep before opening the connection in ProxyConnection (i.e.
BrokerFixture.cpp:98) I can reproduce the hang quite easily.
I.e. it seems there is some sort of race between the SocketProxy Thread
and the main thread that is connecting. Specifically it seems that when
the SocketProxy thread gets to the select() on the listener socket
(SocketProxy.cpp:111) before the client opens the connection, then
SocketProxy thread never leaves the loop waiting for a connection
attempt to accept.
Can anyone shed any light on why that might be the case?
Yup, you isolated the problem nicely. Its fixed it in r765338. The
fd_set was initialized outside the loop, so if you you didn't connect
before the first timeout it was left spinning with an empty fd_set. The
timeout was .5 sec so this hardly ever happened unless things were
slowed down e.g. by valgrind or on some quirky host.
Thanks, Alan! Seems so obvious now you point it out, but I totally
missed that!
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:[email protected]