RE: Haproxy timing issues

Lukas Tribus Wed, 02 Nov 2011 12:46:53 -0700

Hi,



you should switch net.ipv4.tcp_tw_recycle off; you have already tcp_tw_reuse 
on, which serves the same purpose (and it's less dangerous with NATted clients).


http://www.serverphorums.com/read.php?10,182544


Lukas

> From: [email protected]
> To: [email protected]
> CC: [email protected]
> Subject: RE: Haproxy timing issues
> Date: Wed, 2 Nov 2011 18:17:58 +0000
> 
> Hi,
> 
> Yeah the clients are not the problem, we are using 5 different datacenters 
> with 5 machines each so ~25 machines. Hardcore loadtesting :)
> Btw, the loadtest are done transatlantic so that is causing latency etc. 
> 
> After some more testing yesterday we found at just what you mentioned here: 
> using stud with too many processes made the result much
> more worse.
> The perfect setup turned out to be stud with n=3 and haproxy nbproc=1. 
> Increasing n with n=4,5,6.. made the result much worse. 
> 
> When I got these results I used stud with n=6 which caused a lot of response 
> time problems. However, I don't see these response time
> now when running with n=3 in haproxy logs. So how could stud with n=6 affect 
> the response time on the "backend" in haproxy logs?
> 
> We are currently using the latest version of stud from github, 
> bumptech-stud-0.2-76-g8012fe3. Is the "emericbr patches" merge 
> in there or is that a fork?
> 
> The loadtest client is doing a renegotiation for every connection. The 
> scenario is containing 3 small images.
> Each connection is making 3 request times 10 with 3-7s waittime between each 
> request.
> This is to maintain the connection as long as possible and get many active 
> connections. (We also have scenarios doing a lot of conns/s etc).
> 
> Yeah, Aloha would have been cool to test. But this is not for us, this is for 
> a customer :)
> 
> These are my main sysctl values which gave me visible performance improvement:
> net.ipv4.tcp_max_syn_backlog=262144
> net.ipv4.tcp_syncookies=0
> net.ipv4.tcp_tw_reuse=1
> net.ipv4.tcp_no_metrics_save=1
> net.core.somaxconn=262144
> 
> net.ipv4.ip_local_port_range=1024 65536
> net.ipv4.tcp_tw_recycle=1
> 
> These are some more I have tried with but it did not gave me so much 
> improvement:
> #net.ipv4.tcp_rmem=4096 87380 16777216
> #net.ipv4.tcp_wmem=4096 65536 16777216
> #net.ipv4.tcp_fin_timeout = 3
> #net.ipv4.tcp_max_orphans = 262144
> #net.ipv4.tcp_synack_retries = 2
> #net.ipv4.tcp_syn_retries = 2
> 
> #net.core.rmem_max=16777216
> #net.core.wmem_max=16777216
> #net.core.netdev_max_backlog = 262144
> 
> /E
> 
> 
> -----Original Message-----
> From: Baptiste [mailto:[email protected]] 
> Sent: den 1 november 2011 16:08
> To: Erik Torlen
> Cc: [email protected]
> Subject: Re: Haproxy timing issues
> 
> Hi,
> 
> First question: are you sure you're reaching the limit of
> haproxy/varnish and not the limit of your client?
> Mainly concerning the increasing response time.
> 
> How many CPUs do you have in your VM? Starting too much stud proccess
> could be counter-productive.
> I doubt doing CPU affinity in a VM improves something :)
> 
> Concerning the logs, the time we can see on your client side are very
> high! Too high :)
> 3/4s for HAProxy to get the full request.
> 
> How are you running stud?
> Which options? Are you using the one with emericbr patches?
> Are you using requesting using the same SSL Session ID or do you
> renegotiate a new one for each connection?
> 
> Have you checked your network statisitics, on both client and server side?
> netstat -in and netstat -s
> Is there a lot of drops, retransmission, congestion, etc...
> 
> On your last log line, we can see that HAProxy took 22s to establish a
> TCP connection to your Varnish...
> 
> Can you share your stud, haproxy, and varnish configuration, the
> version of each software, the startup parameters for Varnish.
> What kind of tool do you use on your client to run your load test?
> What sysctl have you already tunned?
> 
> 
> Unfortunately, the Aloha does not run on Amazon :)
> 
> 
> cheers,
> 
> 
> On Tue, Nov 1, 2011 at 9:16 PM, Erik Torlen <[email protected]> 
> wrote:
> > Hi,
> >
> > I am currently (and have been from time to time the last weeks) doing some 
> > heavy loadtesting against haproxy with stud in front of it handling the ssl.
> >
> > My loadtest has been focused on loadtesting SSL traffic through stud 
> > against haproxy on amazon ec2.
> >
> > Our current problem is that we cannot get more then ~30k active connections 
> > (~150 conns/s) until we starting to see increased response time (>10-60s) 
> > on the
> > client side. Running with 38k connections now and seeing much higher 
> > response time.
> >
> > The setup is:
> > 1 instance running haproxy + stud
> > 2 instances running varnish server 3 cached images
> >
> > Varnish has 100% cache hit ratio so nothing goes to the backend.
> >
> > We have tried using m1.xlarge and the c1.xlarge. The m1.xlarge uses almost 
> > 100% cpu when doing the loadtests while c1.xlarge has a lot of resources 
> > left (stud using a few percent per process) and haproxy ~60-70%cpu.
> > The only difference is that c1.xlarge gives quite better response time 
> > before the actual problem happens where resp times are increasing.
> >
> > Haproxy is running with nbproc=1
> > Stud is running with n=6 and shared session cache. (Tried it with n=3 as 
> > well
> >
> > From the logging in haproxy I could see that the time it takes to establish 
> > a connection against the backend and receive the data:
> >
> > Haproxy.log
> > Nov  1 18:40:35 127.0.0.1 haproxy[18511]: x.x.x.x:54113 
> > [01/Nov/2011:18:39:40.273] varnish varnish/varnish1 4519/0/73/50215/54809 
> > 200 2715 - - ---- 238/236/4/5/0 0/0 "GET 
> > /assets/images/icons/elite_logo_beta.png HTTP/1.1"
> > Nov  1 18:40:35 127.0.0.1 haproxy[18511]: x.x.x.x:55635 
> > [01/Nov/2011:18:39:41.547] varnish varnish/varnish1 3245/0/81/50207/53535 
> > 200 1512 - - ---- 238/236/3/4/0 0/0 "GET /assets/images/icons/favicon.ico 
> > HTTP/1.1"
> > ...
> > Nov  1 18:40:44 127.0.0.1 haproxy[18511]: x.x.x.x:34453 
> > [01/Nov/2011:18:39:25.330] varnish varnish/varnish1 3082/0/225/32661/79559 
> > 200 1512 - - ---- 234/232/1/2/0 0/0 "GET /assets/images/icons/favicon.ico 
> > HTTP/1.1"
> > Nov  1 18:40:44 127.0.0.1 haproxy[18511]: x.x.x.x:53731 
> > [01/Nov/2011:18:39:25.036] varnish varnish/varnish1 3377/0/216/32669/79854 
> > 200 1725 - - ---- 233/231/0/1/0 0/0 "GET 
> > /assets/images/create/action_btn.png HTTP/1.1"
> >
> > Haproxy.err (NOTE: 504 error here)
> >
> > Nov  1 18:40:11 127.0.0.1 haproxy[18511]: x.x.x.x:34885 
> > [01/Nov/2011:18:39:07.597] varnish varnish/varnish1 4299/0/27/-1/64330 504 
> > 194 - - sH-- 10916/10914/4777/2700/0 0/0 "GET 
> > /assets/images/icons/favicon.ico HTTP/1.1"
> > Nov  1 18:40:12 127.0.0.1 haproxy[18511]: x.x.x.x:58878 
> > [01/Nov/2011:18:39:12.621] varnish varnish/varnish2 314/0/55/-1/60374 504 
> > 194 - - sH-- 3692/3690/3392/1623/0 0/0 "GET 
> > /assets/images/icons/favicon.ico HTTP/1.1"
> >
> > Nov  1 18:40:18 127.0.0.1 haproxy[18511]: x.x.x.x:35505 
> > [01/Nov/2011:18:39:42.670] varnish varnish/varnish1 
> > 3515/0/22078/10217/35811 200 1512 - - ---- 1482/1481/1238/710/1 0/0 "GET 
> > /assets/images/icons/favicon.ico HTTP/1.1"
> > Nov  1 18:40:18 127.0.0.1 haproxy[18511]: x.x.x.x:40602 
> > [01/Nov/2011:18:39:42.056] varnish varnish/varnish1 
> > 4126/0/22081/10226/36435 200 1512 - - ---- 1475/1474/1231/703/1 0/0 "GET 
> > /assets/images/icons/favicon.ico HTTP/1.1"
> >
> >
> > Here is the logs from running haproxy with varnish as a backend on the 
> > local machine:
> >
> > Haproxy.log
> > Nov  1 20:00:52 127.0.0.1 haproxy[18953]: x.x.x.x:38552 
> > [01/Nov/2011:20:00:45.157] varnish varnish/local_varnish 7513/0/0/0/7513 
> > 200 1725 - - ---- 4/3/0/1/0 0/0 "GET /assets/images/create/action_btn.png 
> > HTTP/1.1"
> > Nov  1 20:00:54 127.0.0.1 haproxy[18953]: x.x.x.x:40850 
> > [01/Nov/2011:20:00:48.219] varnish varnish/local_varnish 6524/0/0/0/6524 
> > 200 1725 - - ---- 2/1/0/1/0 0/0 "GET /assets/images/create/action_btn.png 
> > HTTP/1.1"
> >
> > Haproxy.err
> > Nov  1 20:00:38 127.0.0.1 haproxy[18953]: x.x.x.x:39649 
> > [01/Nov/2011:20:00:08.665] varnish varnish/local_varnish 
> > 7412/0/22090/23/29525 200 1511 - - ---- 15700/15698/267/268/1 0/0 "GET 
> > /assets/images/icons/favicon.ico HTTP/1.1"
> > Nov  1 20:00:38 127.0.0.1 haproxy[18953]: x.x.x.x:54565 
> > [01/Nov/2011:20:00:12.255] varnish varnish/local_varnish 
> > 3823/0/22090/23/25936 200 1511 - - ---- 15700/15698/266/267/1 0/0 "GET 
> > /assets/images/icons/favicon.ico HTTP/1.1"
> >
> > I see on all these tests that haproxy-stats is showing %1 idle but "top" 
> > shows that haproxy are using ~70% cpu?
> >
> > The "jungle" aka amazon and its internal network are causing a lot of 
> > latency when running varnish on external machine. The response times get 
> > better when running
> > varnish locally (~0.5s improvement).
> > But there is still very high response times in haproxy.err when running 
> > varnish locally?
> >
> > I have played around with sysctl values and found some that improved my 
> > performance.
> > My feeling is that I need to tune some more values in order to go beyond 
> > this level, suggestions?
> >
> > Kind Regards
> > Erik
> >
> >
>

RE: Haproxy timing issues

Reply via email to