Dear all, Sorry, my lab systems were down for many days and I could not get back on this earlier. After new systems were allocated, I managed to get all the requested information with a fresh ru (Sorry, this is a long mail too!). There are now 4 physical servers, running Debian 3.2.0-4-amd64, connected directly to a common switch:
server1: Run 'ab' in a container, no cpu/memory restriction. server2: Run haproxy in a container, configured with 4 nginx's, cpu/memory configured as shown below. server3: Run 2 different nginx containers, no cpu/mem restriction. server4: Run 2 different nginx containers, for a total of 4 nginx, no cpu/mem restriction. The servers have 2 sockets, each with 24 cores. Socket 0 has cores 0,2,4,..,46 and Socket 1 has cores 1,3,5,..,47. The NIC (ixgbe) is bound to CPU 0. Haproxy is started on cpu's: 2,4,6,8,10,12,14,16, so that is in the same cache line as the nic (nginx is run on different servers as explained above). No tuning on nginx servers as the comparison is between 'ab' -> 'nginx' and 'ab' and 'haproxy' -> nginx(s). The cpus are "Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz". The containers are all configured with 8GB, server having 128GB memory. mpstat and iostat were captured during the test, where the capture started after 'ab' started and capture ended just before 'ab' finished so as to get "warm" numbers. ------------------------------------------------------------------------------------------------------------------------ Request directly to 1 nginx backend server, size=256 bytes: Command: ab -k -n 100000 -c 1000 <nginx>:80/256 Requests per second: 69749.02 [#/sec] (mean) Transfer rate: 34600.18 [Kbytes/sec] received ------------------------------------------------------------------------------------------------------------------------ Request to haproxy configured with 4 nginx backends (nbproc=4), size=256 bytes: Command: ab -k -n 100000 -c 1000 <haproxy>:80/256 Requests per second: 19071.55 [#/sec] (mean) Transfer rate: 9461.28 [Kbytes/sec] received mpstat (first 4 processors only, rest are almost zero): Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 0.44 0.00 1.59 0.00 0.00 2.96 0.00 0.00 0.00 95.01 Average: 0 0.25 0.00 0.75 0.00 0.00 98.01 0.00 0.00 0.00 1.00 Average: 1 1.26 0.00 5.28 0.00 0.00 2.51 0.00 0.00 0.00 90.95 Average: 2 2.76 0.00 8.79 0.00 0.00 5.78 0.00 0.00 0.00 82.66 Average: 3 1.51 0.00 6.78 0.00 0.00 3.02 0.00 0.00 0.00 88.69 pidstat: Average: 105 471 5.00 33.50 0.00 38.50 - haproxy Average: 105 472 6.50 44.00 0.00 50.50 - haproxy Average: 105 473 8.50 40.00 0.00 48.50 - haproxy Average: 105 475 2.50 14.00 0.00 16.50 - haproxy ------------------------------------------------------------------------------------------------------------------------ Request directly to 1 nginx backend server, size=64K Command: ab -k -n 100000 -c 1000 <nginx>:80/64K Requests per second: 3342.56 [#/sec] (mean) Transfer rate: 214759.11 [Kbytes/sec] received ------------------------------------------------------------------------------------------------------------------------ Request to haproxy configured with 4 nginx backends (nbproc=4), size=64K Command: ab -k -n 100000 -c 1000 <haproxy>:80/64K Requests per second: 1283.62 [#/sec] (mean) Transfer rate: 82472.35 [Kbytes/sec] received mpstat (first 4 processors only, rest are almost zero): Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 0.08 0.00 0.74 0.01 0.00 2.62 0.00 0.00 0.00 96.55 Average: 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 Average: 1 1.03 0.00 9.98 0.21 0.00 7.67 0.00 0.00 0.00 81.10 Average: 2 0.70 0.00 6.32 0.00 0.00 4.50 0.00 0.00 0.00 88.48 Average: 3 0.15 0.00 2.04 0.06 0.00 1.73 0.00 0.00 0.00 96.03 pidstat: Average: UID PID %usr %system %guest %CPU CPU Command Average: 105 471 0.93 14.70 0.00 15.63 - haproxy Average: 105 472 1.12 21.55 0.00 22.67 - haproxy Average: 105 473 1.41 20.95 0.00 22.36 - haproxy Average: 105 475 0.22 4.85 0.00 5.07 - haproxy ------------------------------------------------------------------------------ Build information: HA-Proxy version 1.5.8 2014/10/31 Copyright 2000-2014 Willy Tarreau <w...@1wt.eu> Build options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200 Encrypted password support via crypt(3): yes Built with zlib version : 1.2.8 Compression algorithms supported : identity, deflate, gzip Built with OpenSSL version : OpenSSL 1.0.1k 8 Jan 2015 Running on OpenSSL version : OpenSSL 1.0.1k 8 Jan 2015 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 8.35 2014-04-04 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. ------------------------------------------------------------------------------ Configuration file: global daemon maxconn 60000 quiet nbproc 4 maxpipes 16384 user haproxy group haproxy stats socket /var/run/haproxy.sock mode 600 level admin stats timeout 2m defaults option forwardfor option http-server-close retries 3 option redispatch maxconn 60000 option splice-auto option prefer-last-server timeout connect 5000ms timeout client 50000ms timeout server 50000ms userlist stats-auth group admin users admin user admin insecure-password admin frontend www-http bind *:80 reqadd X-Forwarded-Proto:\ http default_backend www-backend backend www-backend mode http maxconn 60000 stats enable stats uri /stats acl AUTH http_auth(stats-auth) acl AUTH_ADMIN http_auth(stats-auth) admin stats http-request auth unless AUTH balance roundrobin option prefer-last-server option splice-auto server nginx-1 192.168.122.200:80 maxconn 15000 check server nginx-2 192.168.122.139:80 maxconn 15000 check server nginx-3 192.168.122.59:80 maxconn 15000 check server nginx-4 192.168.122.60:80 maxconn 15000 check ------------------------------------------------------------------------------ The Requests/sec and transfer rate are both about 4 times worse. Any clue what I could do to get similar numbers in haproxy as compared to directly using the backend? BTW, I am assuming that haproxy cannot give better numbers than single system even though haproxy can be served by multiple backends, since subsequent packets of a connection will go to the same server - the intent is to just load balance. Thank you for the help once again, Regards, - Krishna Kumar On Wed, Apr 1, 2015 at 5:04 PM, Krishna Kumar Unnikrishnan (Engineering) < krishna...@flipkart.com> wrote: > (Resending, as I unintentionally dropped the ML earlier). > > Hi Pavlos and Willy, > > Thank you very much for your responses. I was silent since my development > systems > were shut down for re-racking, and I had no chance to get the results to > Pavlos's > suggestions. > > Thanks Willy, I will make your suggested changes (misguided - as I am very > new to > haproxy), and report on the findings, once my system are back online, > hopefully tmrw. > > (With the move to Lua, hope to see even "better" performance numbers!). > > Regards, > - Krishna Kumar > > > > On Wed, Apr 1, 2015 at 3:27 PM, Willy Tarreau <w...@1wt.eu> wrote: > >> Hi, >> >> On Mon, Mar 30, 2015 at 10:43:51AM +0530, Krishna Kumar Unnikrishnan >> (Engineering) wrote: >> > Hi all, >> > >> > I am testing haproxy as follows: >> > >> > System1: 24 Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, 64 GB. This >> system >> > is running 3.19.0 kernel, and hosts the following servers: >> > 1. nginx1 server - cpu 1-2, 1G memory, runs as a Linux >> > container using cpuset.cpus feature. >> > 2. nginx2 server - cpu 3-4, 1G memory, runs via LXC. >> > 3. nginx3 server - cpu 5-6, 1G memory, runs via LXC. >> > 4. nginx4 server - cpu 7-8, 1G memory, runs via LXC. >> > 5. haproxy - cpu 9-10, 1G memory runs via LXC. Runs haproxy >> > ver 1.5.8: configured with above 4 container's ip >> > addresses as the backend. >> > >> > System2: 56 Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, 128 GB. This >> system >> > is running 3.19.0, and run's 'ab' either to the haproxy node, or >> > directly to an nginx container. System1 & System2 are locally >> > connected via a switch with Intel 10G cards. >> > >> > With very small packets of 64 bytes, I am getting the following results: >> > >> > A. ab -n 100000 -c 4096 http://nginx1:80/64 >> > ----------------------------------------- >> > >> > Concurrency Level: 4096 >> > Time taken for tests: 3.232 seconds >> > Complete requests: 100000 >> > Failed requests: 0 >> > Total transferred: 28800000 bytes >> > HTML transferred: 6400000 bytes >> > Requests per second: 30943.26 [#/sec] (mean) >> > Time per request: 132.371 [ms] (mean) >> > Time per request: 0.032 [ms] (mean, across all concurrent >> requests) >> > Transfer rate: 8702.79 [Kbytes/sec] received >> > >> > Connection Times (ms) >> > min mean[+/-sd] median max >> > Connect: 9 65 137.4 45 1050 >> > Processing: 4 52 25.3 51 241 >> > Waiting: 3 37 19.2 35 234 >> > Total: 16 117 146.1 111 1142 >> > >> > Percentage of the requests served within a certain time (ms) >> > 50% 111 66% 119 75% 122 >> > 80% 124 90% 133 95% 215 >> > 98% 254 99% 1126 100% 1142 (longest request) >> > >> > B. ab -n 100000 -c 4096 http://haproxy:80/64 >> > ---------------------------------------------- >> > >> > Concurrency Level: 4096 >> > Time taken for tests: 5.503 seconds >> > Complete requests: 100000 >> > Failed requests: 0 >> > Total transferred: 28800000 bytes >> > HTML transferred: 6400000 bytes >> > Requests per second: 18172.96 [#/sec] (mean) >> > Time per request: 225.390 [ms] (mean) >> > Time per request: 0.055 [ms] (mean, across all concurrent >> requests) >> > Transfer rate: 5111.15 [Kbytes/sec] received >> > >> > Connection Times (ms) >> > min mean[+/-sd] median max >> > Connect: 0 134 358.3 23 3033 >> > Processing: 2 61 47.7 51 700 >> > Waiting: 2 50 43.0 42 685 >> > Total: 7 194 366.7 79 3122 >> > >> > Percentage of the requests served within a certain time (ms) >> > 50% 79 66% 105 75% 134 >> > 80% 159 90% 318 95% 1076 >> > 98% 1140 99% 1240 100% 3122 (longest request) >> > >> > I expected haproxy to deliver better results with multiple connections, >> > since >> > haproxy will round-robin between the 4 servers. I have done no tuning, >> and >> > have >> > used the config file at the end of this mail. With 256K file size, the >> times >> > are slightly better for haproxy vs nginx. I notice that %requests >> served is >> > similar for both cases till about 90%. >> >> I'm seeing a very simple and common explanation to this. You're stressing >> the TCP stack and it becomes the bottleneck. Both haproxy and nginx make >> very little use of userland and spend most of their time in the kernel, >> so by putting both of them on the same system image, you're still subject >> to the session table lookups, locking and whatever limits the processing. >> And in fact, by adding haproxy in front of nginx on the same system, you >> have effectively double the kernel's job, and you're measuring about half >> of the performance, so there's nothing much surprizing here. >> >> Please check the CPU usage as Pavlos mentionned. I'm guessing that your >> system is spending most of its time in system and/or softirq. Also, maybe >> you have conntrack enabled on the system. In this case, having the >> components >> on the machine will triple the conntrack session rate, effectively >> increasing >> its work. >> >> There's something you can try in your config below to see if the >> connection rate is mostly responsible for the trouble : >> >> > global >> > maxconn 65536 >> > ulimit-n 65536 >> >> Please remove ulimit-n BTW, it's wrong and not needed. >> >> > daemon >> > quiet >> > nbproc 2 >> > user haproxy >> > group haproxy >> > >> > defaults >> > #log global >> > mode http >> > option dontlognull >> > retries 3 >> > option redispatch >> > maxconn 65536 >> > timeout connect 5000 >> > timeout client 50000 >> > timeout server 50000 >> > >> > listen my_ha_proxy 192.168.1.110:80 >> > mode http >> > stats enable >> > stats auth someuser:somepassword >> > balance roundrobin >> > cookie JSESSIONID prefix >> > option httpclose >> >> Here, please remove "option httpclose" and use "option prefer-last-server" >> instead. Then comment out the cookie line for the time of a test, it will >> try to optimize the reuse of the connections. You also need to run "ab" >> with "-k" to enable keep-alive. If you notice an important performance >> boost, it means that the connection setup/teardown is expensive and then >> that it might be responsible for your slowdown. >> >> Another important point is the CPU pinning. You're using a 2xxx CPU which >> is a dual-socket one. So I don't know how your logical CPUs are spread >> over >> physical ones, but in general for clear HTTP, you definitely want to >> remove >> any inter-CPU communications, meaning that nginx, haproxy and the NIC have >> to be attached to physical cores of the same CPU socket. Please pick the >> socket the NIC is physically attached to so that the network traffic >> doesn't >> have to pass via the QPI link. That way you'll ensure that the L3 cache >> will >> be populated with the data you're using and you won't experience cache >> misses >> nor flushes between each session which bounces from one CPU to the other >> one. >> >> It can *possibly* make sense to move nginx instances into their own >> containers >> to the CPU not connected to the NIC, but I honnestly don't know, that's >> what >> you'll have to experiment with. >> >> Regards, >> Willy >> >> > >