On Wed, 2009-04-15 at 16:39 +0100, Gordon Sim wrote: > Gordon Sim wrote: > > I am seeing an automated build hang on the unit tests. It doesn't happen > > every time or on every machine (running on RHEL5). I _believe_ it was > > first introduced by the changes to SocketProxy in r758852[1] (can't > > reproduce the hang before that revision). > > > > I can only reproduce the hang when running under valgrind. From adding > > in some debug logging it appears to be due to the proxy connection > > failing to be notified of any connection attempt and therefore keeps > > waiting to accept. > > > > I.e. the SocketProxy thread spins in the loop starting > > SocketProxy.cpp:108. The main thread is meanwhile waiting for the > > connection to open in BrokerFixture.cpp:98. > > A little more information on this. It appears that introducing a short > sleep before opening the connection in ProxyConnection (i.e. > BrokerFixture.cpp:98) I can reproduce the hang quite easily. > > I.e. it seems there is some sort of race between the SocketProxy Thread > and the main thread that is connecting. Specifically it seems that when > the SocketProxy thread gets to the select() on the listener socket > (SocketProxy.cpp:111) before the client opens the connection, then > SocketProxy thread never leaves the loop waiting for a connection > attempt to accept. > > Can anyone shed any light on why that might be the case?
Yup, you isolated the problem nicely. Its fixed it in r765338. The fd_set was initialized outside the loop, so if you you didn't connect before the first timeout it was left spinning with an empty fd_set. The timeout was .5 sec so this hardly ever happened unless things were slowed down e.g. by valgrind or on some quirky host. --------------------------------------------------------------------- Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:[email protected]
