Thanks, I'm certainly on the same page with you... I've certainly run into this kind of stuff when there's a run away process and the like.
That said, I've run the production log of ruby through their log analysis tool (request-log-analyzer). That tool will chew on the logs and give you the min, max, and average load times for any transaction that occurs in Ruby... thing is... I don't see any query with a taking over a few seconds. Essentially, from what I can tell, apache is responding within seconds... or if it's not, it's not tossing any errors to the access or error logs. PhilD On Fri, Jul 30, 2010 at 9:52 AM, Willy Tarreau <[email protected]> wrote: > Hi Phil, > > On Wed, Jul 28, 2010 at 02:06:57PM -0400, Phil Dupont wrote: > > I could use a hand... we recently switched to HAProxy to do some > > basic load-balancing for our site. Nothing too fancy... basically just > > splitting traffic between two apache servers running a ruby on rails > site. > > > > Mostly, everything is fantastic. However, periodically we get server > > timeouts (sH in the error logs) and users see the 504 gateway timeout > error. > > Not a ton, but a few hundred throughout the day. > > That's very common on some applications which have a few complex > requests that can take ages. A few in a day may mean that there is > 100% CPU for the time you have configured in your timeouts, or 100% > I/O on the database, but that amount of time is not large enough to > be reported in graphs or monitoring. > > > What's killing me is that there doesn't appear to be a logical reason... > CPU > > load is non-existent on all the boxes (less than 1%), memory use on the > > boxes are low (we have about a gig of ram free on each), connections are > > fine (at the most 30 concurrent apache connections on each box), and we > > don't see any run away processes on the MySQL database. > > That would really match what I describe above. If you're not looking > at the exact moment the problem manifests itself, you can't see anything. > > > Further, when I look at the apache and ruby error and connection logs.... > I > > don't see any errors being tossed. > > You should isolate the 504 from haproxy's logs, you'll get many information > about where the time is spent and whether those are always the same > requests > or not. I suspect only one request is concerned and you'll find "sH" flags > indicating that the server has failed to respond in time, with the > associated > time in the last field of the timers. > > > Basically, from what I can tell, for no good reason we're randomly > getting > > timeouts... > > > > Any ideas where I can look for the cause of the problem? Anyone else > > encounter this? Anything else I should consider looking at? > > Quite frankly, the most common cause for 504 are long database requests. > But I don't know what's the cause in your case. > > Regards, > Willy > >

