On Tue, 2010-10-26 at 19:24 +0200, Willy Tarreau wrote: > Hi Maxime, Hi Willy
> > On Tue, Oct 26, 2010 at 12:47:37PM -0400, Maxime Ducharme wrote: > > > > Hi guys > > > > I am new to haproxy & list, my experience with this software is very > > good yet. > > > > Got a question about sockets tuning. We have web site running on 10 > > different httpd with 2 haproxy in front. > > > > We configured 3 IPs on each haproxy, we get about 2200 req/s each, peak > > time is 3500 req/s each. > > > > Current load is very low actually on haproxy boxes, but we have noticed > > some slow access to the website. Doing analysis we found out that > > sometime opening a TCP socket on haproxy box is slower than opening a > > socket directly on one of httpd behind. > > > > The actual configuration is quite simple, here is snippet : > > > > global > > maxconn 32768 > > nbproc 8 > > > > defaults > > log global > > retries 3 > > maxconn 32768 > > contimeout 5000 > > clitimeout 50000 > > srvtimeout 50000 > > > > listen weblb1 1.1.1.1:80 > > bind 1.1.1.2:80 > > bind 1.1.1.3:80 > > > > mode http > > balance roundrobin > > > > option forwardfor > > option httpchk HEAD / HTTP/1.0 > > option httpclose > > stats enable > > server web1 1.1.2.1:80 weight 10 check port 80 > > .. > > server web10 1.1.2.10:80 weight 10 check port 80 > > > > > > We put nbprocs to the same amount of CPU cores we have. > > > > We noticed problem by tracing HTTP request with curl, ex: > > > > 15:14:13.684549 * About to connect() to www.website.com port 80 (#0) > > 15:14:13.685620 * Trying 1.1.1.1... connected > > --> 3 seconds here to open TCP connection > > 15:14:16.796281 * Connected to www.website.com (1.1.1.1) port 80 (#0) > > 15:14:16.797173 > GET / HTTP/1.1 > > --> httpd replies here in less than 1 second > > A 3 second delay is a typical SYN retransmit. make sense > > > This issue happens sometime, not always. > > > > My question, can someone point me a direction to look for for sockets > > optimization / debugging. I am currently unable to explain why it is > > slow, I know this is not hardware related since it is very powerful box. > > I believe some tuning will make a big difference. Maybe we have kernel > > tuning to do in here, if someone can enlighten me it would be very > > appreciated. > > Two things to look for : > - if you have ip_conntrack / nf_conntrack loaded, either you have to > unload it, or to properly tune it for your usage (I'd recommend the > former, it's easier). good point, not loaded > > - check sys.net.core.somaxconn. If it's 128, then your TCP stack is not > tuned for a high connection rate, and you're surely dropping incoming > connections from time to time. Try to first increase that single > parameter to 10000, restart haproxy and check if it changes anything. > was set to 128. Raised to value to 10000 and we see better results now. A new problem appeared tough which is : Oct 28 19:07:40 v-2-fg09-d861-15 kernel: [735810.205858] TCP: drop open request from 1.1.1.1/42274 Oct 28 19:07:45 v-2-fg09-d861-15 kernel: [735815.237132] TCP: drop open request from 1.1.1.2/2847 Oct 28 19:07:50 v-2-fg09-d861-15 kernel: [735820.276368] TCP: drop open request from 1.1.1.3/3925 Oct 28 19:07:55 v-2-fg09-d861-15 kernel: [735825.308858] TCP: drop open request from 1.1.1.4/49952 ... I also see unreplied SYNs in netstat : # netstat -an |grep SYN_RECV |grep -cv grep 1426 Now I am taking a look at tcp_max_syn_backlog value, I am thinking of raising this value also but I would like to have your opinion. We see this issue when req/s get to 2300/s, only in peak time of day. Rest of day is ok and response time is excellent. > Note that you don't need 8 processes with that load, it will be harder to > debug, health checks will not be synced, and stats will only be per-process. Good, we now run 1 instance only. > > > Another question : > > > > can I enable stats on a particular IP ? > > yes, simply put the "stats enable" statement in its own listen section. thanks > > Last, with version 1.4, you can also reduce the connection rate by using > "option http-server-close" instead of "option httpclose". It will enable > keep-alive on the client side. Do that only when you have fixed your > issues, because doing so can mask the problem without fixing it, and you'll > get it again later. thanks for this also, I will look into this one after. > > Regards, > Willy > > have a nice day Maxime

