On Tue, 2010-10-26 at 19:24 +0200, Willy Tarreau wrote:
> Hi Maxime,

Hi Willy

> 
> On Tue, Oct 26, 2010 at 12:47:37PM -0400, Maxime Ducharme wrote:
> > 
> > Hi guys
> > 
> > I am new to haproxy & list, my experience with this software is very
> > good yet.
> > 
> > Got a question about sockets tuning. We have web site running on 10
> > different httpd with 2 haproxy in front.
> > 
> > We configured 3 IPs on each haproxy, we get about 2200 req/s each, peak
> > time is 3500 req/s each.
> > 
> > Current load is very low actually on haproxy boxes, but we have noticed
> > some slow access to the website. Doing analysis we found out that
> > sometime opening a TCP socket on haproxy box is slower than opening a
> > socket directly on one of httpd behind.
> > 
> > The actual configuration is quite simple, here is snippet :
> > 
> > global
> >         maxconn 32768
> >         nbproc 8
> > 
> > defaults
> >         log     global
> >         retries 3
> >         maxconn 32768
> >         contimeout 5000
> >         clitimeout 50000
> >         srvtimeout 50000
> > 
> > listen weblb1 1.1.1.1:80
> >         bind 1.1.1.2:80
> >         bind 1.1.1.3:80
> > 
> >         mode http
> >         balance roundrobin      
> > 
> >         option forwardfor
> >         option httpchk HEAD / HTTP/1.0
> >         option httpclose
> >         stats enable
> >         server web1 1.1.2.1:80 weight 10 check port 80
> >         ..
> >         server web10 1.1.2.10:80 weight 10 check port 80
> > 
> > 
> > We put nbprocs to the same amount of CPU cores we have.
> > 
> > We noticed problem by tracing HTTP request with curl, ex:
> > 
> > 15:14:13.684549 * About to connect() to www.website.com port 80 (#0)
> > 15:14:13.685620 *   Trying 1.1.1.1... connected
> > --> 3 seconds here to open TCP connection
> > 15:14:16.796281 * Connected to www.website.com (1.1.1.1) port 80 (#0)
> > 15:14:16.797173 > GET / HTTP/1.1
> > --> httpd replies here in less than 1 second
> 
> A 3 second delay is a typical SYN retransmit.

make sense

> 
> > This issue happens sometime, not always.
> > 
> > My question, can someone point me a direction to look for for sockets
> > optimization / debugging. I am currently unable to explain why it is
> > slow, I know this is not hardware related since it is very powerful box.
> > I believe some tuning will make a big difference. Maybe we have kernel
> > tuning to do in here, if someone can enlighten me it would be very
> > appreciated.
> 
> Two things to look for :
>   - if you have ip_conntrack / nf_conntrack loaded, either you have to
>     unload it, or to properly tune it for your usage (I'd recommend the
>     former, it's easier).

good point, not loaded

> 
>   - check sys.net.core.somaxconn. If it's 128, then your TCP stack is not
>     tuned for a high connection rate, and you're surely dropping incoming
>     connections from time to time. Try to first increase that single
>     parameter to 10000, restart haproxy and check if it changes anything.
> 

was set to 128. Raised to value to 10000 and we see better results now.
A new problem appeared tough which is :
Oct 28 19:07:40 v-2-fg09-d861-15 kernel: [735810.205858] TCP: drop open
request from 1.1.1.1/42274
Oct 28 19:07:45 v-2-fg09-d861-15 kernel: [735815.237132] TCP: drop open
request from 1.1.1.2/2847
Oct 28 19:07:50 v-2-fg09-d861-15 kernel: [735820.276368] TCP: drop open
request from 1.1.1.3/3925
Oct 28 19:07:55 v-2-fg09-d861-15 kernel: [735825.308858] TCP: drop open
request from 1.1.1.4/49952
...

I also see unreplied SYNs in netstat :
# netstat -an |grep SYN_RECV |grep -cv grep
1426

Now I am taking a look at tcp_max_syn_backlog value, I am thinking of
raising this value also but I would like to have your opinion. We see
this issue when req/s get to 2300/s, only in peak time of day. Rest of
day is ok and response time is excellent.


> Note that you don't need 8 processes with that load, it will be harder to
> debug, health checks will not be synced, and stats will only be per-process.

Good, we now run 1 instance only.

> 
> > Another question :
> > 
> > can I enable stats on a particular IP ?
> 
> yes, simply put the "stats enable" statement in its own listen section.

thanks

> 
> Last, with version 1.4, you can also reduce the connection rate by using
> "option http-server-close" instead of "option httpclose". It will enable
> keep-alive on the client side. Do that only when you have fixed your
> issues, because doing so can mask the problem without fixing it, and you'll
> get it again later.

thanks for this also, I will look into this one after.


> 
> Regards,
> Willy
> 
> 

have a nice day

Maxime


Reply via email to