Hi,
you should switch net.ipv4.tcp_tw_recycle off; you have already tcp_tw_reuse on, which serves the same purpose (and it's less dangerous with NATted clients). http://www.serverphorums.com/read.php?10,182544 Lukas > From: [email protected] > To: [email protected] > CC: [email protected] > Subject: RE: Haproxy timing issues > Date: Wed, 2 Nov 2011 18:17:58 +0000 > > Hi, > > Yeah the clients are not the problem, we are using 5 different datacenters > with 5 machines each so ~25 machines. Hardcore loadtesting :) > Btw, the loadtest are done transatlantic so that is causing latency etc. > > After some more testing yesterday we found at just what you mentioned here: > using stud with too many processes made the result much > more worse. > The perfect setup turned out to be stud with n=3 and haproxy nbproc=1. > Increasing n with n=4,5,6.. made the result much worse. > > When I got these results I used stud with n=6 which caused a lot of response > time problems. However, I don't see these response time > now when running with n=3 in haproxy logs. So how could stud with n=6 affect > the response time on the "backend" in haproxy logs? > > We are currently using the latest version of stud from github, > bumptech-stud-0.2-76-g8012fe3. Is the "emericbr patches" merge > in there or is that a fork? > > The loadtest client is doing a renegotiation for every connection. The > scenario is containing 3 small images. > Each connection is making 3 request times 10 with 3-7s waittime between each > request. > This is to maintain the connection as long as possible and get many active > connections. (We also have scenarios doing a lot of conns/s etc). > > Yeah, Aloha would have been cool to test. But this is not for us, this is for > a customer :) > > These are my main sysctl values which gave me visible performance improvement: > net.ipv4.tcp_max_syn_backlog=262144 > net.ipv4.tcp_syncookies=0 > net.ipv4.tcp_tw_reuse=1 > net.ipv4.tcp_no_metrics_save=1 > net.core.somaxconn=262144 > > net.ipv4.ip_local_port_range=1024 65536 > net.ipv4.tcp_tw_recycle=1 > > These are some more I have tried with but it did not gave me so much > improvement: > #net.ipv4.tcp_rmem=4096 87380 16777216 > #net.ipv4.tcp_wmem=4096 65536 16777216 > #net.ipv4.tcp_fin_timeout = 3 > #net.ipv4.tcp_max_orphans = 262144 > #net.ipv4.tcp_synack_retries = 2 > #net.ipv4.tcp_syn_retries = 2 > > #net.core.rmem_max=16777216 > #net.core.wmem_max=16777216 > #net.core.netdev_max_backlog = 262144 > > /E > > > -----Original Message----- > From: Baptiste [mailto:[email protected]] > Sent: den 1 november 2011 16:08 > To: Erik Torlen > Cc: [email protected] > Subject: Re: Haproxy timing issues > > Hi, > > First question: are you sure you're reaching the limit of > haproxy/varnish and not the limit of your client? > Mainly concerning the increasing response time. > > How many CPUs do you have in your VM? Starting too much stud proccess > could be counter-productive. > I doubt doing CPU affinity in a VM improves something :) > > Concerning the logs, the time we can see on your client side are very > high! Too high :) > 3/4s for HAProxy to get the full request. > > How are you running stud? > Which options? Are you using the one with emericbr patches? > Are you using requesting using the same SSL Session ID or do you > renegotiate a new one for each connection? > > Have you checked your network statisitics, on both client and server side? > netstat -in and netstat -s > Is there a lot of drops, retransmission, congestion, etc... > > On your last log line, we can see that HAProxy took 22s to establish a > TCP connection to your Varnish... > > Can you share your stud, haproxy, and varnish configuration, the > version of each software, the startup parameters for Varnish. > What kind of tool do you use on your client to run your load test? > What sysctl have you already tunned? > > > Unfortunately, the Aloha does not run on Amazon :) > > > cheers, > > > On Tue, Nov 1, 2011 at 9:16 PM, Erik Torlen <[email protected]> > wrote: > > Hi, > > > > I am currently (and have been from time to time the last weeks) doing some > > heavy loadtesting against haproxy with stud in front of it handling the ssl. > > > > My loadtest has been focused on loadtesting SSL traffic through stud > > against haproxy on amazon ec2. > > > > Our current problem is that we cannot get more then ~30k active connections > > (~150 conns/s) until we starting to see increased response time (>10-60s) > > on the > > client side. Running with 38k connections now and seeing much higher > > response time. > > > > The setup is: > > 1 instance running haproxy + stud > > 2 instances running varnish server 3 cached images > > > > Varnish has 100% cache hit ratio so nothing goes to the backend. > > > > We have tried using m1.xlarge and the c1.xlarge. The m1.xlarge uses almost > > 100% cpu when doing the loadtests while c1.xlarge has a lot of resources > > left (stud using a few percent per process) and haproxy ~60-70%cpu. > > The only difference is that c1.xlarge gives quite better response time > > before the actual problem happens where resp times are increasing. > > > > Haproxy is running with nbproc=1 > > Stud is running with n=6 and shared session cache. (Tried it with n=3 as > > well > > > > From the logging in haproxy I could see that the time it takes to establish > > a connection against the backend and receive the data: > > > > Haproxy.log > > Nov 1 18:40:35 127.0.0.1 haproxy[18511]: x.x.x.x:54113 > > [01/Nov/2011:18:39:40.273] varnish varnish/varnish1 4519/0/73/50215/54809 > > 200 2715 - - ---- 238/236/4/5/0 0/0 "GET > > /assets/images/icons/elite_logo_beta.png HTTP/1.1" > > Nov 1 18:40:35 127.0.0.1 haproxy[18511]: x.x.x.x:55635 > > [01/Nov/2011:18:39:41.547] varnish varnish/varnish1 3245/0/81/50207/53535 > > 200 1512 - - ---- 238/236/3/4/0 0/0 "GET /assets/images/icons/favicon.ico > > HTTP/1.1" > > ... > > Nov 1 18:40:44 127.0.0.1 haproxy[18511]: x.x.x.x:34453 > > [01/Nov/2011:18:39:25.330] varnish varnish/varnish1 3082/0/225/32661/79559 > > 200 1512 - - ---- 234/232/1/2/0 0/0 "GET /assets/images/icons/favicon.ico > > HTTP/1.1" > > Nov 1 18:40:44 127.0.0.1 haproxy[18511]: x.x.x.x:53731 > > [01/Nov/2011:18:39:25.036] varnish varnish/varnish1 3377/0/216/32669/79854 > > 200 1725 - - ---- 233/231/0/1/0 0/0 "GET > > /assets/images/create/action_btn.png HTTP/1.1" > > > > Haproxy.err (NOTE: 504 error here) > > > > Nov 1 18:40:11 127.0.0.1 haproxy[18511]: x.x.x.x:34885 > > [01/Nov/2011:18:39:07.597] varnish varnish/varnish1 4299/0/27/-1/64330 504 > > 194 - - sH-- 10916/10914/4777/2700/0 0/0 "GET > > /assets/images/icons/favicon.ico HTTP/1.1" > > Nov 1 18:40:12 127.0.0.1 haproxy[18511]: x.x.x.x:58878 > > [01/Nov/2011:18:39:12.621] varnish varnish/varnish2 314/0/55/-1/60374 504 > > 194 - - sH-- 3692/3690/3392/1623/0 0/0 "GET > > /assets/images/icons/favicon.ico HTTP/1.1" > > > > Nov 1 18:40:18 127.0.0.1 haproxy[18511]: x.x.x.x:35505 > > [01/Nov/2011:18:39:42.670] varnish varnish/varnish1 > > 3515/0/22078/10217/35811 200 1512 - - ---- 1482/1481/1238/710/1 0/0 "GET > > /assets/images/icons/favicon.ico HTTP/1.1" > > Nov 1 18:40:18 127.0.0.1 haproxy[18511]: x.x.x.x:40602 > > [01/Nov/2011:18:39:42.056] varnish varnish/varnish1 > > 4126/0/22081/10226/36435 200 1512 - - ---- 1475/1474/1231/703/1 0/0 "GET > > /assets/images/icons/favicon.ico HTTP/1.1" > > > > > > Here is the logs from running haproxy with varnish as a backend on the > > local machine: > > > > Haproxy.log > > Nov 1 20:00:52 127.0.0.1 haproxy[18953]: x.x.x.x:38552 > > [01/Nov/2011:20:00:45.157] varnish varnish/local_varnish 7513/0/0/0/7513 > > 200 1725 - - ---- 4/3/0/1/0 0/0 "GET /assets/images/create/action_btn.png > > HTTP/1.1" > > Nov 1 20:00:54 127.0.0.1 haproxy[18953]: x.x.x.x:40850 > > [01/Nov/2011:20:00:48.219] varnish varnish/local_varnish 6524/0/0/0/6524 > > 200 1725 - - ---- 2/1/0/1/0 0/0 "GET /assets/images/create/action_btn.png > > HTTP/1.1" > > > > Haproxy.err > > Nov 1 20:00:38 127.0.0.1 haproxy[18953]: x.x.x.x:39649 > > [01/Nov/2011:20:00:08.665] varnish varnish/local_varnish > > 7412/0/22090/23/29525 200 1511 - - ---- 15700/15698/267/268/1 0/0 "GET > > /assets/images/icons/favicon.ico HTTP/1.1" > > Nov 1 20:00:38 127.0.0.1 haproxy[18953]: x.x.x.x:54565 > > [01/Nov/2011:20:00:12.255] varnish varnish/local_varnish > > 3823/0/22090/23/25936 200 1511 - - ---- 15700/15698/266/267/1 0/0 "GET > > /assets/images/icons/favicon.ico HTTP/1.1" > > > > I see on all these tests that haproxy-stats is showing %1 idle but "top" > > shows that haproxy are using ~70% cpu? > > > > The "jungle" aka amazon and its internal network are causing a lot of > > latency when running varnish on external machine. The response times get > > better when running > > varnish locally (~0.5s improvement). > > But there is still very high response times in haproxy.err when running > > varnish locally? > > > > I have played around with sysctl values and found some that improved my > > performance. > > My feeling is that I need to tune some more values in order to go beyond > > this level, suggestions? > > > > Kind Regards > > Erik > > > > >

