On Fri, Mar 06, 2009 at 11:23:02AM -0800, Michael Fortson wrote:
> On Fri, Mar 6, 2009 at 8:43 AM, Willy Tarreau <w...@1wt.eu> wrote:
> > Hi Michael,
> >
> > On Thu, Mar 05, 2009 at 01:04:06PM -0800, Michael Fortson wrote:
> >> I'm trying to understand why our proxied requests have a much greater
> >> chance of significant delay than non-proxied requests.
> >>
> >> The server is an 8-core (dual quad) Intel machine. Making requests
> >> directly to the nginx backend is just far more reliable. Here's a
> >> shell script output that just continuously requests a blank 0k image
> >> file from nginx directly on its own port, and spits out a timestamp if
> >> the delay isn't 0 or 1 seconds:
> >>
> >> Thu Mar 5 12:36:17 PST 2009
> >> beginning continuous test of nginx port 8080
> >> Thu Mar 5 12:38:06 PST 2009
> >> Nginx Time is 2 seconds
> >>
> >>
> >>
> >> Here's the same test running through haproxy, simultaneously:
> >>
> >> Thu Mar 5 12:36:27 PST 2009
> >> beginning continuous test of haproxy port 80
> >> Thu Mar 5 12:39:39 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:39:48 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:39:55 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:40:03 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:40:45 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:40:48 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:40:55 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:40:58 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:41:55 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:42:01 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:42:08 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:42:29 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:42:38 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:43:05 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:43:15 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:44:08 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:44:25 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:44:30 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:44:33 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:44:39 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:44:46 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:44:54 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:45:07 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:45:16 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:45:45 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:45:54 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:45:58 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:46:05 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:46:08 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:46:32 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:46:48 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:46:53 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:46:58 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:47:40 PST 2009
> >> Nginx Time is 3 seconds
> >
> > 3 seconds is typically a TCP retransmit. You have network losses somewhere
> > from/to your haproxy. Would you happen to be running on a gigabit port
> > connected to a 100 Mbps switch ? What type of NIC is this ? I've seen
> > many problems with broadcom netxtreme 2 (bnx2) caused by buggy firmwares,
> > but it seems to work fine for other people after a firmware upgrade.
> >
> >> My sanitized haproxy config is here (mongrel backend was omitted for 
> >> brevity) :
> >> http://pastie.org/408729
> >>
> >> Are the ACLs just too expensive?
> >
> > Not at all. Especially in your case. To reach 3 seconds of latency, you 
> > would
> > need hundreds of thousands of ACLs, so this is clearly unrelated to your 
> > config.
> >
> >> Nginx is running with 4 processes, and the box shows mostly idle.
> >
> > ... which indicates that you aren't burning CPU cycles processing ACLs ;-)
> >
> > It is also possible that some TCP settings are too low for your load, but
> > I don't know what your load is. Above a few hundreds-thousands of sessions
> > per second, you will need to do some tuning, otherwise you can end up with
> > similar situations.
> >
> > Regards,
> > Willy
> >
> >
> 
> Hmm. I think it is gigabit connected to 100 Mb (all Dell rack-mount
> servers and switches).

OK so then please check with ethtool if your port is running in half
or full duplex :

# ethtool eth0

Most often, 100 Mbps switches are forced to 100-full without autoneg,
and gig ports in front of them see them as half thinking they are hubs.

> The nginx backend runs on the same machine as
> haproxy and is referenced via 127.0.0.1 -- does that still involve a
> real network port? Should I try the test all on localhost to isolate
> it from any networking retransmits?

Yes if you can do that, that would be nice. If the issue persists,
we'll have to check the network stack tuning, but that's getting
harder as it depends on the workload. Also, please provide the
output of "netstat -s".

> Here's a peek at the stats page after about a day of running (this
> should help demonstrate current loading)
> http://pastie.org/409632

I'm seeing something odd here. A lot of mongrel servers experience
connection retries. Are they located on the same machine or are they
on the network ?

I suspect that your nginx backend is "localhost-http" though I'm
not sure on the stats output here, the columns are not easy to
follow. BTW, instead of copy-pasting the stats output, you can
simply save the HTML page, it's self-contained and will appear
the same to whoever consults it.

Regards,
Willy


Reply via email to