Hi Phil,

On Wed, Jul 28, 2010 at 02:06:57PM -0400, Phil Dupont wrote:
> I could use a hand... we recently switched to HAProxy to do some
> basic load-balancing for our site.  Nothing too fancy... basically just
> splitting traffic between two apache servers running a ruby on rails site.
> 
> Mostly, everything is fantastic.  However, periodically we get server
> timeouts (sH in the error logs) and users see the 504 gateway timeout error.
>  Not a ton, but a few hundred throughout the day.

That's very common on some applications which have a few complex
requests that can take ages. A few in a day may mean that there is
100% CPU for the time you have configured in your timeouts, or 100%
I/O on the database, but that amount of time is not large enough to
be reported in graphs or monitoring.

> What's killing me is that there doesn't appear to be a logical reason... CPU
> load is non-existent on all the boxes (less than 1%), memory use on the
> boxes are low (we have about a gig of ram free on each), connections are
> fine (at the most 30 concurrent apache connections on each box), and we
> don't see any run away processes on the MySQL database.

That would really match what I describe above. If you're not looking
at the exact moment the problem manifests itself, you can't see anything.

> Further, when I look at the apache and ruby error and connection logs.... I
> don't see any errors being tossed.

You should isolate the 504 from haproxy's logs, you'll get many information
about where the time is spent and whether those are always the same requests
or not. I suspect only one request is concerned and you'll find "sH" flags
indicating that the server has failed to respond in time, with the associated
time in the last field of the timers.

> Basically, from what I can tell, for no good reason we're randomly getting
> timeouts...
> 
> Any ideas where I can look for the cause of the problem?  Anyone else
> encounter this?  Anything else I should consider looking at?

Quite frankly, the most common cause for 504 are long database requests.
But I don't know what's the cause in your case.

Regards,
Willy


Reply via email to