Hi Matt, On Thu, Jun 09, 2011 at 01:50:11PM -0700, Matt Christiansen wrote: > Hi Willy, > > I agree the haproxy logs show that, but we also monitor the time spent > processing the request which takes in to account, GC, reading data off > the FS and a number of things inside the app and I see no 3sec times > in there or anything near it. Also I have no 3 sec outliers in output > from my test so that seems a little weird it says 3secs.
What I really hate with 3sec is that it's the common TCP retransmit time, and normally it indicates packet losses. I have implicitly excluded that possibility since it runs well with nginx on the same machine, but still that must not be omitted. Still, the time measured by application servers generally does not include the time spent in queues, so you should be very careful with this. For all components there will always be an un monitored area. For instance, haproxy cannot know the time spent by the request in the system's backlog, which can be huge under a syn flood attack or when maxconn is too low. > Also I have > the connections set really high to prevent queueing for now, we > usually only have around 1000-2000 connections open. > > uname -a > > Linux 2.6.18-194.17.1.el5 #1 SMP Wed Sep 29 12:50:31 EDT 2010 x86_64 > x86_64 x86_64 GNU/Linux OK, RH5 so I agree you won't do TCP splicing on this one. Could you check if the number of TCP retransmits increases between two runs (with netstat -s) ? It's worth archiving a full copy before and after the dump in order to focus on things we could discover there. Also, would you happen to have nf_conntrack running (check with lsmod) ? When this is the case, we always have very ugly results, but it mainly affects connect times and in your case I saw large response times too. > haproxy -vv > > HA-Proxy version 1.4.15 2011/04/08 > Copyright 2000-2010 Willy Tarreau <[email protected]> > > Build options : > TARGET = linux26 > CPU = generic > CC = gcc > CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing > OPTIONS = USE_PCRE=1 Everything's fine here. (...) > My config Everything OK here too. You said that numbers slightly improved with tcp-smart-accept and tcp-smart-connect. Normally it can be caused by congested network or by losses. What really puzzles me is that while those issues are very common, I don't see why they wouldn't show up with nginx too. Oh one thing I forgot which can make a difference : buffer sizes. The larger the buffer, the smoother losses will be absorbed because they'll induce fewer timeouts/RTTs. I don't know what size nginx uses, but I remember it has dynamic buffer sizes. Haproxy defaults to 16 kB. You can try to increase to 64 kB and see if it changes anything : global tune.bufsize 65536 Maybe you should run a tcpdump between haproxy and the server, or even better, on the haproxy machine AND on one of the servers (you can disable a number of servers if it's a test config). That way we'll know how the response time spreads around. Regards, Willy

