On Fri, Mar 6, 2009 at 8:43 AM, Willy Tarreau <[email protected]> wrote: > Hi Michael, > > On Thu, Mar 05, 2009 at 01:04:06PM -0800, Michael Fortson wrote: >> I'm trying to understand why our proxied requests have a much greater >> chance of significant delay than non-proxied requests. >> >> The server is an 8-core (dual quad) Intel machine. Making requests >> directly to the nginx backend is just far more reliable. Here's a >> shell script output that just continuously requests a blank 0k image >> file from nginx directly on its own port, and spits out a timestamp if >> the delay isn't 0 or 1 seconds: >> >> Thu Mar 5 12:36:17 PST 2009 >> beginning continuous test of nginx port 8080 >> Thu Mar 5 12:38:06 PST 2009 >> Nginx Time is 2 seconds >> >> >> >> Here's the same test running through haproxy, simultaneously: >> >> Thu Mar 5 12:36:27 PST 2009 >> beginning continuous test of haproxy port 80 >> Thu Mar 5 12:39:39 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:39:48 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:39:55 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:40:03 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:40:45 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:40:48 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:40:55 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:40:58 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:41:55 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:42:01 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:42:08 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:42:29 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:42:38 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:43:05 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:43:15 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:44:08 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:44:25 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:44:30 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:44:33 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:44:39 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:44:46 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:44:54 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:45:07 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:45:16 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:45:45 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:45:54 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:45:58 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:46:05 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:46:08 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:46:32 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:46:48 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:46:53 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:46:58 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:47:40 PST 2009 >> Nginx Time is 3 seconds > > 3 seconds is typically a TCP retransmit. You have network losses somewhere > from/to your haproxy. Would you happen to be running on a gigabit port > connected to a 100 Mbps switch ? What type of NIC is this ? I've seen > many problems with broadcom netxtreme 2 (bnx2) caused by buggy firmwares, > but it seems to work fine for other people after a firmware upgrade. > >> My sanitized haproxy config is here (mongrel backend was omitted for >> brevity) : >> http://pastie.org/408729 >> >> Are the ACLs just too expensive? > > Not at all. Especially in your case. To reach 3 seconds of latency, you would > need hundreds of thousands of ACLs, so this is clearly unrelated to your > config. > >> Nginx is running with 4 processes, and the box shows mostly idle. > > ... which indicates that you aren't burning CPU cycles processing ACLs ;-) > > It is also possible that some TCP settings are too low for your load, but > I don't know what your load is. Above a few hundreds-thousands of sessions > per second, you will need to do some tuning, otherwise you can end up with > similar situations. > > Regards, > Willy > >
Hmm. I think it is gigabit connected to 100 Mb (all Dell rack-mount servers and switches). The nginx backend runs on the same machine as haproxy and is referenced via 127.0.0.1 -- does that still involve a real network port? Should I try the test all on localhost to isolate it from any networking retransmits? Here's a peek at the stats page after about a day of running (this should help demonstrate current loading) http://pastie.org/409632

