Hi,

On Sat, Jan 25, 2014 at 10:12:50AM +0100, pechspilz wrote:
> Hi,
> 
> I'm seeing a very sporadic issue (every few days) in a TCP mode proxy 
> and I'm at loss what could be causing it.
> 
> After some time, haproxy is unable to establish a connection with the 
> backend server. 20000 is the configured connect timeout. I can't see any 
> exhausted resources in haproxy or on the frontend or backend server.
> 
> Jan 25 06:29:17 sugardaddy haproxy[14013]: 123.123.123.123:55698 
> [25/Jan/2014:06:28:57.210] f1/server1 1/-1/20000 0 sC 999/7/7/7/0 0/0

This "999" makes me think there's a 1000 file descriptors limit somewhere
in your setup. However it should not cause a timeout, it should immediately
fail. Is it always 999 or does the value change ? Are you running with long
connections or short ones ?

And could you please show the output of "haproxy -vv" ? Ideally your config
as well (without private information), just to see the maxconns, timeouts,
etc...

> Once this happens, every other connection attempt to this proxy will be 
> met with the same timeout. Restarting haproxy instantly makes it all 
> work again.

With long connections, this could be explained by a server which would not
accept more than a certain number of concurrent connections. When you stop
haproxy, all connections are destroyed so the server starts to accept them
again. But 7 connections per server seems so low...

> Other frontend/backend proxy configurations running in the 
> same haproxy instance are not affected, it just happens to one proxy.

That could confirm the theory of a per-server limitation.

> I could be wrong but since the problem vanishes after restarting 
> haproxy, I don't think it's an issue with the backend server. Restarting 
> haproxy every hour "fixes" the problem but I'd prefer to keep haproxy 
> running all day/night.

Well, I'd say that it's totally unacceptable to have to restart it if it's
not for a configuration change, so we need to sort this out to find if it's
a configuration issue, and environment issue or a bug.

> I tried the last 3 released 1.5 dev versions, it didn't make a 
> difference.

OK that's very useful information.

I think that at some point, an strace of the process and a tcpdump on
the interface when the problem appears will significantly help. I noticed
that you changed your IPs in the log, and I understand you don't want to
publish internal information. So if you have such traces available, feel
free to send them to me off-list or to provide me with a link to download
them.

Thanks,
Willy


Reply via email to