I guess that's my point... I don't know where else to check. There's nothing between the HAProxy box and the webservers but a switch. Certainly nothing I can get logs from.
If I drill into an example, I end up with nothing. See below for the logs of one scenario I tried to trace. But essentially: - I see an HAProxy 504 error for someone trying to load http://scoutmob.com/new-york/howitworks - Then I check the apache access logs (not the error logs because the error logs don't contain the user's IP addess, just the proxy's IP address) and see the connection was successful (passed a 200) - Then I run request-log-analyzer against the ruby log and see that the method that creates the /new-york/howitworks takes a max of .01s to load. >From the HAProxy Log: Jul 29 20:21:47 localhost haproxy[17405]: 68.174.74.20:55350[29/Jul/2010:20:20:57.478] http_proxy primary_farm/WS01 4/0/1/-1/50006 504 194 - - sH-- 14/14/14/13/0 0/0 "GET /new-york/howitworks HTTP/1.1" >From the WS01 Apache Log (grep'd on 29/Jul | 68.174.74.20 | howitworks): 68.174.74.20 - - [29/Jul/2010:20:21:31 -0400] "GET /images/counter/counter_small_line.png HTTP/1.1" 200 393 " http://scoutmob.com/new-york/howitworks" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" 68.174.74.20 - - [29/Jul/2010:20:20:57 -0400] "GET /new-york/howitworks HTTP/1.1" 200 0 "http://scoutmob.com/new-york/deal" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" 68.174.74.20 - - [29/Jul/2010:20:21:30 -0400] "GET /new-york/howitworks HTTP/1.1" 200 5698 "http://scoutmob.com/new-york/howitworks" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" 68.174.74.20 - - [29/Jul/2010:20:21:31 -0400] "GET /new-york/unexpired.json HTTP/1.1" 200 43098 "http://scoutmob.com/new-york/howitworks" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" Request-Log-Analyzer against Ruby Log (for the 20:00 to 21:00 hour): HTTP statuses returned -------------------------------------------------------------------------------- 200 - 3 hits - 100.0% - -------------------------------------------------------- Rails action cache hits -------------------------------------------------------------------------------- No hit - 3 hits - 100.0% - ----------------------------------------------------- -------------------------------------------------------------------------------- Request duration - by sum - Hits - Sum - Mean - StdDev - Min - Max -------------------------------------------------------------------------------- SessionsController#howitwork - 3 - 0.03s - 0.01s - 0.00s - 0.01s - 0.01s PhilD * * On Fri, Jul 30, 2010 at 3:04 PM, Willy Tarreau <[email protected]> wrote: > On Fri, Jul 30, 2010 at 02:57:15PM -0400, Phil Dupont wrote: > > Thanks, I'm certainly on the same page with you... I've certainly run > into > > this kind of stuff when there's a run away process and the like. > > > > That said, I've run the production log of ruby through their log analysis > > tool (request-log-analyzer). That tool will chew on the logs and give > you > > the min, max, and average load times for any transaction that occurs in > > Ruby... thing is... I don't see any query with a taking over a few > seconds. > > > > > > Essentially, from what I can tell, apache is responding within seconds... > or > > if it's not, it's not tossing any errors to the access or error logs. > > OK, but if you see haproxy report 504, you must check haproxy's logs to > find what requests experienced those errors, when, after how much time. > Maybe you'll find that you never see them in apache logs simply because > the apache process dies after running a loop and cannot log. It's common > not to find errors on the component which causes them. That's the reason > why you must first isolate those errors from haproxy's logs, and then > search them on the other components. If you see that haproxy says that > apache took 40 seconds while apache says it took 2 seconds on the same > requests, then you know it's somewhere between the two. But as long as > you don't have anything to compare, you're in the total darkness. > > Willy > >

