Alan Conway wrote:
On Wed, 2009-04-15 at 16:39 +0100, Gordon Sim wrote:
Gordon Sim wrote:
I am seeing an automated build hang on the unit tests. It doesn't happen every time or on every machine (running on RHEL5). I _believe_ it was first introduced by the changes to SocketProxy in r758852[1] (can't reproduce the hang before that revision).

I can only reproduce the hang when running under valgrind. From adding in some debug logging it appears to be due to the proxy connection failing to be notified of any connection attempt and therefore keeps waiting to accept.

I.e. the SocketProxy thread spins in the loop starting SocketProxy.cpp:108. The main thread is meanwhile waiting for the connection to open in BrokerFixture.cpp:98.
A little more information on this. It appears that introducing a short sleep before opening the connection in ProxyConnection (i.e. BrokerFixture.cpp:98) I can reproduce the hang quite easily.

I.e. it seems there is some sort of race between the SocketProxy Thread and the main thread that is connecting. Specifically it seems that when the SocketProxy thread gets to the select() on the listener socket (SocketProxy.cpp:111) before the client opens the connection, then SocketProxy thread never leaves the loop waiting for a connection attempt to accept.

Can anyone shed any light on why that might be the case?

Yup, you isolated the problem nicely. Its fixed it in r765338. The
fd_set was initialized outside the loop, so if you you didn't connect
before the first timeout it was left spinning with an empty fd_set. The
timeout was .5 sec so this hardly ever happened unless things were
slowed down e.g. by valgrind or on some quirky host.

Thanks, Alan! Seems so obvious now you point it out, but I totally missed that!

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

Reply via email to