Hi, On Sat, Jan 25, 2014 at 10:12:50AM +0100, pechspilz wrote: > Hi, > > I'm seeing a very sporadic issue (every few days) in a TCP mode proxy > and I'm at loss what could be causing it. > > After some time, haproxy is unable to establish a connection with the > backend server. 20000 is the configured connect timeout. I can't see any > exhausted resources in haproxy or on the frontend or backend server. > > Jan 25 06:29:17 sugardaddy haproxy[14013]: 123.123.123.123:55698 > [25/Jan/2014:06:28:57.210] f1/server1 1/-1/20000 0 sC 999/7/7/7/0 0/0
This "999" makes me think there's a 1000 file descriptors limit somewhere in your setup. However it should not cause a timeout, it should immediately fail. Is it always 999 or does the value change ? Are you running with long connections or short ones ? And could you please show the output of "haproxy -vv" ? Ideally your config as well (without private information), just to see the maxconns, timeouts, etc... > Once this happens, every other connection attempt to this proxy will be > met with the same timeout. Restarting haproxy instantly makes it all > work again. With long connections, this could be explained by a server which would not accept more than a certain number of concurrent connections. When you stop haproxy, all connections are destroyed so the server starts to accept them again. But 7 connections per server seems so low... > Other frontend/backend proxy configurations running in the > same haproxy instance are not affected, it just happens to one proxy. That could confirm the theory of a per-server limitation. > I could be wrong but since the problem vanishes after restarting > haproxy, I don't think it's an issue with the backend server. Restarting > haproxy every hour "fixes" the problem but I'd prefer to keep haproxy > running all day/night. Well, I'd say that it's totally unacceptable to have to restart it if it's not for a configuration change, so we need to sort this out to find if it's a configuration issue, and environment issue or a bug. > I tried the last 3 released 1.5 dev versions, it didn't make a > difference. OK that's very useful information. I think that at some point, an strace of the process and a tcpdump on the interface when the problem appears will significantly help. I noticed that you changed your IPs in the log, and I understand you don't want to publish internal information. So if you have such traces available, feel free to send them to me off-list or to provide me with a link to download them. Thanks, Willy

