Hi Jay,

On Fri, Jul 20, 2012 at 02:57:54PM -0400, Jay Levitt wrote:
> I wrote:
> >We have a simple HTTP haproxy setup---one front end, one back end, two
> >servers. (Those servers, in turn, run Apache/Passenger/Rails.) When we
> >deploy our app, we disable one server in haproxy (using the admin
> >socket), restart the app, re-enable the server, and repeat on the second
> >server, in an attempt to get zero-downtime deploys.  This works fine in
> >a makeshift capistrano task, but when we do our actual deploy, haproxy
> >gets<NOSRV>  for nearly a minute.
> [snip]
> 
> I updated our config with each server's maxconn set to 32, which (if I 
> understand keepalive socket requirements, which I don't) is double the 
> number of actual connections being made to our 16 passenger workers.

Indeed, I didn't know you were that limited on the backend servers !
Of course, you need to keep your maxconn lower than the server's limit.
You should also keep at least 1 or 2 slots free for health checks and
for your own maintenance and checks. So maybe try 14. But I don't expect
it to be much better anyway.

> Now, instead of 503's, we get 504's:
> 
> Jul 20 13:33:12 404586-front1 haproxy[12193]: front1:44879 
> [20/Jul/2012:13:32:42.098] front1 front1server2 0/0/0/-1/30001 504 194 - 
> - sH-- 54/54/54/32/0 0/0 "GET /some/page HTTP/1.1"

That's expected, the excess connections are in the backend server's TCP
SYN backlog. This means the system has acknowledged the connection but
the application server did not yet perform accept() on it. Haproxy has
no way to know that, all it knows is that its connection was accepted,
the request was sent and the server never replied.

> If I'm reading this right, we now have 54 connections, 32 of which were 
> to server2, so server2 can't take any more load.  Right?

Exactly !

> This makes me think it's something on our backend; the last good request 
> showed this connection/queue state:
> 
> 2/2/2/0/0 0/0

Yes clearly. It's abnormal that some requests take that long to be processed.

> and that's pretty normal for us.  So I think this may be nothing to do 
> with haproxy, and somehow during a deploy of server1, server2 isn't 
> taking any new connections.  Is that how it reads to you?

Yes. You should then keep your maxconn lower than 16 and check your logs
to see which requests take a long time. You have the "halog" tool in the
haproxy's contrib directory in the source package. You'd better take the
latest version for this. Some flags are very handy, such as sorting URLs
by response time :

     halog -uto < access.log

It will report first the URLs which consume the most server time. That
way you'll find more easily what is wrong (eg: you might notice that
you're always doing a heavy SQL request that is common to all slow
requests).

Regards,
Willy


Reply via email to