Hi, On Mon, Mar 30, 2015 at 10:43:51AM +0530, Krishna Kumar Unnikrishnan (Engineering) wrote: > Hi all, > > I am testing haproxy as follows: > > System1: 24 Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, 64 GB. This system > is running 3.19.0 kernel, and hosts the following servers: > 1. nginx1 server - cpu 1-2, 1G memory, runs as a Linux > container using cpuset.cpus feature. > 2. nginx2 server - cpu 3-4, 1G memory, runs via LXC. > 3. nginx3 server - cpu 5-6, 1G memory, runs via LXC. > 4. nginx4 server - cpu 7-8, 1G memory, runs via LXC. > 5. haproxy - cpu 9-10, 1G memory runs via LXC. Runs haproxy > ver 1.5.8: configured with above 4 container's ip > addresses as the backend. > > System2: 56 Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, 128 GB. This system > is running 3.19.0, and run's 'ab' either to the haproxy node, or > directly to an nginx container. System1 & System2 are locally > connected via a switch with Intel 10G cards. > > With very small packets of 64 bytes, I am getting the following results: > > A. ab -n 100000 -c 4096 http://nginx1:80/64 > ----------------------------------------- > > Concurrency Level: 4096 > Time taken for tests: 3.232 seconds > Complete requests: 100000 > Failed requests: 0 > Total transferred: 28800000 bytes > HTML transferred: 6400000 bytes > Requests per second: 30943.26 [#/sec] (mean) > Time per request: 132.371 [ms] (mean) > Time per request: 0.032 [ms] (mean, across all concurrent requests) > Transfer rate: 8702.79 [Kbytes/sec] received > > Connection Times (ms) > min mean[+/-sd] median max > Connect: 9 65 137.4 45 1050 > Processing: 4 52 25.3 51 241 > Waiting: 3 37 19.2 35 234 > Total: 16 117 146.1 111 1142 > > Percentage of the requests served within a certain time (ms) > 50% 111 66% 119 75% 122 > 80% 124 90% 133 95% 215 > 98% 254 99% 1126 100% 1142 (longest request) > > B. ab -n 100000 -c 4096 http://haproxy:80/64 > ---------------------------------------------- > > Concurrency Level: 4096 > Time taken for tests: 5.503 seconds > Complete requests: 100000 > Failed requests: 0 > Total transferred: 28800000 bytes > HTML transferred: 6400000 bytes > Requests per second: 18172.96 [#/sec] (mean) > Time per request: 225.390 [ms] (mean) > Time per request: 0.055 [ms] (mean, across all concurrent requests) > Transfer rate: 5111.15 [Kbytes/sec] received > > Connection Times (ms) > min mean[+/-sd] median max > Connect: 0 134 358.3 23 3033 > Processing: 2 61 47.7 51 700 > Waiting: 2 50 43.0 42 685 > Total: 7 194 366.7 79 3122 > > Percentage of the requests served within a certain time (ms) > 50% 79 66% 105 75% 134 > 80% 159 90% 318 95% 1076 > 98% 1140 99% 1240 100% 3122 (longest request) > > I expected haproxy to deliver better results with multiple connections, > since > haproxy will round-robin between the 4 servers. I have done no tuning, and > have > used the config file at the end of this mail. With 256K file size, the times > are slightly better for haproxy vs nginx. I notice that %requests served is > similar for both cases till about 90%.
I'm seeing a very simple and common explanation to this. You're stressing the TCP stack and it becomes the bottleneck. Both haproxy and nginx make very little use of userland and spend most of their time in the kernel, so by putting both of them on the same system image, you're still subject to the session table lookups, locking and whatever limits the processing. And in fact, by adding haproxy in front of nginx on the same system, you have effectively double the kernel's job, and you're measuring about half of the performance, so there's nothing much surprizing here. Please check the CPU usage as Pavlos mentionned. I'm guessing that your system is spending most of its time in system and/or softirq. Also, maybe you have conntrack enabled on the system. In this case, having the components on the machine will triple the conntrack session rate, effectively increasing its work. There's something you can try in your config below to see if the connection rate is mostly responsible for the trouble : > global > maxconn 65536 > ulimit-n 65536 Please remove ulimit-n BTW, it's wrong and not needed. > daemon > quiet > nbproc 2 > user haproxy > group haproxy > > defaults > #log global > mode http > option dontlognull > retries 3 > option redispatch > maxconn 65536 > timeout connect 5000 > timeout client 50000 > timeout server 50000 > > listen my_ha_proxy 192.168.1.110:80 > mode http > stats enable > stats auth someuser:somepassword > balance roundrobin > cookie JSESSIONID prefix > option httpclose Here, please remove "option httpclose" and use "option prefer-last-server" instead. Then comment out the cookie line for the time of a test, it will try to optimize the reuse of the connections. You also need to run "ab" with "-k" to enable keep-alive. If you notice an important performance boost, it means that the connection setup/teardown is expensive and then that it might be responsible for your slowdown. Another important point is the CPU pinning. You're using a 2xxx CPU which is a dual-socket one. So I don't know how your logical CPUs are spread over physical ones, but in general for clear HTTP, you definitely want to remove any inter-CPU communications, meaning that nginx, haproxy and the NIC have to be attached to physical cores of the same CPU socket. Please pick the socket the NIC is physically attached to so that the network traffic doesn't have to pass via the QPI link. That way you'll ensure that the L3 cache will be populated with the data you're using and you won't experience cache misses nor flushes between each session which bounces from one CPU to the other one. It can *possibly* make sense to move nginx instances into their own containers to the CPU not connected to the NIC, but I honnestly don't know, that's what you'll have to experiment with. Regards, Willy

