(Resending, as I unintentionally dropped the ML earlier). Hi Pavlos and Willy,
Thank you very much for your responses. I was silent since my development systems were shut down for re-racking, and I had no chance to get the results to Pavlos's suggestions. Thanks Willy, I will make your suggested changes (misguided - as I am very new to haproxy), and report on the findings, once my system are back online, hopefully tmrw. (With the move to Lua, hope to see even "better" performance numbers!). Regards, - Krishna Kumar On Wed, Apr 1, 2015 at 3:27 PM, Willy Tarreau <[email protected]> wrote: > Hi, > > On Mon, Mar 30, 2015 at 10:43:51AM +0530, Krishna Kumar Unnikrishnan > (Engineering) wrote: > > Hi all, > > > > I am testing haproxy as follows: > > > > System1: 24 Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, 64 GB. This system > > is running 3.19.0 kernel, and hosts the following servers: > > 1. nginx1 server - cpu 1-2, 1G memory, runs as a Linux > > container using cpuset.cpus feature. > > 2. nginx2 server - cpu 3-4, 1G memory, runs via LXC. > > 3. nginx3 server - cpu 5-6, 1G memory, runs via LXC. > > 4. nginx4 server - cpu 7-8, 1G memory, runs via LXC. > > 5. haproxy - cpu 9-10, 1G memory runs via LXC. Runs haproxy > > ver 1.5.8: configured with above 4 container's ip > > addresses as the backend. > > > > System2: 56 Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, 128 GB. This > system > > is running 3.19.0, and run's 'ab' either to the haproxy node, or > > directly to an nginx container. System1 & System2 are locally > > connected via a switch with Intel 10G cards. > > > > With very small packets of 64 bytes, I am getting the following results: > > > > A. ab -n 100000 -c 4096 http://nginx1:80/64 > > ----------------------------------------- > > > > Concurrency Level: 4096 > > Time taken for tests: 3.232 seconds > > Complete requests: 100000 > > Failed requests: 0 > > Total transferred: 28800000 bytes > > HTML transferred: 6400000 bytes > > Requests per second: 30943.26 [#/sec] (mean) > > Time per request: 132.371 [ms] (mean) > > Time per request: 0.032 [ms] (mean, across all concurrent requests) > > Transfer rate: 8702.79 [Kbytes/sec] received > > > > Connection Times (ms) > > min mean[+/-sd] median max > > Connect: 9 65 137.4 45 1050 > > Processing: 4 52 25.3 51 241 > > Waiting: 3 37 19.2 35 234 > > Total: 16 117 146.1 111 1142 > > > > Percentage of the requests served within a certain time (ms) > > 50% 111 66% 119 75% 122 > > 80% 124 90% 133 95% 215 > > 98% 254 99% 1126 100% 1142 (longest request) > > > > B. ab -n 100000 -c 4096 http://haproxy:80/64 > > ---------------------------------------------- > > > > Concurrency Level: 4096 > > Time taken for tests: 5.503 seconds > > Complete requests: 100000 > > Failed requests: 0 > > Total transferred: 28800000 bytes > > HTML transferred: 6400000 bytes > > Requests per second: 18172.96 [#/sec] (mean) > > Time per request: 225.390 [ms] (mean) > > Time per request: 0.055 [ms] (mean, across all concurrent requests) > > Transfer rate: 5111.15 [Kbytes/sec] received > > > > Connection Times (ms) > > min mean[+/-sd] median max > > Connect: 0 134 358.3 23 3033 > > Processing: 2 61 47.7 51 700 > > Waiting: 2 50 43.0 42 685 > > Total: 7 194 366.7 79 3122 > > > > Percentage of the requests served within a certain time (ms) > > 50% 79 66% 105 75% 134 > > 80% 159 90% 318 95% 1076 > > 98% 1140 99% 1240 100% 3122 (longest request) > > > > I expected haproxy to deliver better results with multiple connections, > > since > > haproxy will round-robin between the 4 servers. I have done no tuning, > and > > have > > used the config file at the end of this mail. With 256K file size, the > times > > are slightly better for haproxy vs nginx. I notice that %requests served > is > > similar for both cases till about 90%. > > I'm seeing a very simple and common explanation to this. You're stressing > the TCP stack and it becomes the bottleneck. Both haproxy and nginx make > very little use of userland and spend most of their time in the kernel, > so by putting both of them on the same system image, you're still subject > to the session table lookups, locking and whatever limits the processing. > And in fact, by adding haproxy in front of nginx on the same system, you > have effectively double the kernel's job, and you're measuring about half > of the performance, so there's nothing much surprizing here. > > Please check the CPU usage as Pavlos mentionned. I'm guessing that your > system is spending most of its time in system and/or softirq. Also, maybe > you have conntrack enabled on the system. In this case, having the > components > on the machine will triple the conntrack session rate, effectively > increasing > its work. > > There's something you can try in your config below to see if the > connection rate is mostly responsible for the trouble : > > > global > > maxconn 65536 > > ulimit-n 65536 > > Please remove ulimit-n BTW, it's wrong and not needed. > > > daemon > > quiet > > nbproc 2 > > user haproxy > > group haproxy > > > > defaults > > #log global > > mode http > > option dontlognull > > retries 3 > > option redispatch > > maxconn 65536 > > timeout connect 5000 > > timeout client 50000 > > timeout server 50000 > > > > listen my_ha_proxy 192.168.1.110:80 > > mode http > > stats enable > > stats auth someuser:somepassword > > balance roundrobin > > cookie JSESSIONID prefix > > option httpclose > > Here, please remove "option httpclose" and use "option prefer-last-server" > instead. Then comment out the cookie line for the time of a test, it will > try to optimize the reuse of the connections. You also need to run "ab" > with "-k" to enable keep-alive. If you notice an important performance > boost, it means that the connection setup/teardown is expensive and then > that it might be responsible for your slowdown. > > Another important point is the CPU pinning. You're using a 2xxx CPU which > is a dual-socket one. So I don't know how your logical CPUs are spread over > physical ones, but in general for clear HTTP, you definitely want to remove > any inter-CPU communications, meaning that nginx, haproxy and the NIC have > to be attached to physical cores of the same CPU socket. Please pick the > socket the NIC is physically attached to so that the network traffic > doesn't > have to pass via the QPI link. That way you'll ensure that the L3 cache > will > be populated with the data you're using and you won't experience cache > misses > nor flushes between each session which bounces from one CPU to the other > one. > > It can *possibly* make sense to move nginx instances into their own > containers > to the CPU not connected to the NIC, but I honnestly don't know, that's > what > you'll have to experiment with. > > Regards, > Willy > >

