Re: [haproxy]: Performance of haproxy-to-4-nginx vs direct-to-nginx

Willy Tarreau Wed, 01 Apr 2015 02:58:49 -0700

Hi,

On Mon, Mar 30, 2015 at 10:43:51AM +0530, Krishna Kumar Unnikrishnan 
(Engineering) wrote:
> Hi all,
> 
> I am testing haproxy as follows:
> 
> System1: 24 Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, 64 GB. This system
>     is running 3.19.0 kernel, and hosts the following servers:
>         1. nginx1 server - cpu 1-2, 1G memory, runs as a Linux
>             container using cpuset.cpus feature.
>         2. nginx2 server - cpu 3-4, 1G memory, runs via LXC.
>         3. nginx3 server - cpu 5-6, 1G memory, runs via LXC.
>         4. nginx4 server - cpu 7-8, 1G memory, runs via LXC.
>         5. haproxy - cpu 9-10, 1G memory runs via LXC. Runs haproxy
>             ver 1.5.8: configured with above 4 container's ip
>             addresses as the backend.
> 
> System2: 56 Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, 128 GB. This system
>     is running 3.19.0, and run's 'ab' either to the haproxy node, or
>     directly to an nginx container. System1 & System2 are locally
>     connected via a switch with Intel 10G cards.
> 
> With very small packets of 64 bytes, I am getting the following results:
> 
> A. ab -n 100000 -c 4096 http://nginx1:80/64
> -----------------------------------------
> 
> Concurrency Level:      4096
> Time taken for tests:   3.232 seconds
> Complete requests:      100000
> Failed requests:        0
> Total transferred:      28800000 bytes
> HTML transferred:       6400000 bytes
> Requests per second:    30943.26 [#/sec] (mean)
> Time per request:       132.371 [ms] (mean)
> Time per request:       0.032 [ms] (mean, across all concurrent requests)
> Transfer rate:          8702.79 [Kbytes/sec] received
> 
> Connection Times (ms)
>               min  mean[+/-sd] median   max
> Connect:        9   65 137.4     45    1050
> Processing:     4   52  25.3     51     241
> Waiting:        3   37  19.2     35     234
> Total:         16  117 146.1    111    1142
> 
> Percentage of the requests served within a certain time (ms)
>   50%    111     66%    119     75%    122
>   80%    124     90%    133     95%    215
>   98%    254     99%   1126     100%   1142 (longest request)
> 
> B. ab -n 100000 -c 4096 http://haproxy:80/64
> ----------------------------------------------
> 
> Concurrency Level:      4096
> Time taken for tests:   5.503 seconds
> Complete requests:      100000
> Failed requests:        0
> Total transferred:      28800000 bytes
> HTML transferred:       6400000 bytes
> Requests per second:    18172.96 [#/sec] (mean)
> Time per request:       225.390 [ms] (mean)
> Time per request:       0.055 [ms] (mean, across all concurrent requests)
> Transfer rate:          5111.15 [Kbytes/sec] received
> 
> Connection Times (ms)
>               min  mean[+/-sd] median   max
> Connect:        0  134 358.3     23    3033
> Processing:     2   61  47.7     51     700
> Waiting:        2   50  43.0     42     685
> Total:          7  194 366.7     79    3122
> 
> Percentage of the requests served within a certain time (ms)
>   50%     79     66%    105     75%    134
>   80%    159     90%    318     95%   1076
>   98%   1140     99%   1240     100%   3122 (longest request)
> 
> I expected haproxy to deliver better results with multiple connections,
> since
> haproxy will round-robin between the 4 servers. I have done no tuning, and
> have
> used the config file at the end of this mail. With 256K file size, the times
> are slightly better for haproxy vs nginx. I notice that %requests served is
> similar for both cases till about 90%.


I'm seeing a very simple and common explanation to this. You're stressing
the TCP stack and it becomes the bottleneck. Both haproxy and nginx make
very little use of userland and spend most of their time in the kernel,
so by putting both of them on the same system image, you're still subject
to the session table lookups, locking and whatever limits the processing.
And in fact, by adding haproxy in front of nginx on the same system, you
have effectively double the kernel's job, and you're measuring about half
of the performance, so there's nothing much surprizing here.

Please check the CPU usage as Pavlos mentionned. I'm guessing that your
system is spending most of its time in system and/or softirq. Also, maybe
you have conntrack enabled on the system. In this case, having the components
on the machine will triple the conntrack session rate, effectively increasing
its work.

There's something you can try in your config below to see if the
connection rate is mostly responsible for the trouble :

> global
>     maxconn  65536
>         ulimit-n 65536

Please remove ulimit-n BTW, it's wrong and not needed.

>     daemon
>     quiet
>     nbproc 2
>         user haproxy
>         group haproxy
> 
> defaults
>         #log     global
>         mode    http
>         option  dontlognull
>         retries 3
>         option redispatch
>         maxconn 65536
>         timeout connect     5000
>         timeout client      50000
>         timeout server      50000
> 
> listen my_ha_proxy 192.168.1.110:80
>        mode http
>        stats enable
>        stats auth someuser:somepassword
>        balance roundrobin
>        cookie JSESSIONID prefix
>        option httpclose

Here, please remove "option httpclose" and use "option prefer-last-server"
instead. Then comment out the cookie line for the time of a test, it will
try to optimize the reuse of the connections. You also need to run "ab"
with "-k" to enable keep-alive. If you notice an important performance
boost, it means that the connection setup/teardown is expensive and then
that it might be responsible for your slowdown.

Another important point is the CPU pinning. You're using a 2xxx CPU which
is a dual-socket one. So I don't know how your logical CPUs are spread over
physical ones, but in general for clear HTTP, you definitely want to remove
any inter-CPU communications, meaning that nginx, haproxy and the NIC have
to be attached to physical cores of the same CPU socket. Please pick the
socket the NIC is physically attached to so that the network traffic doesn't
have to pass via the QPI link. That way you'll ensure that the L3 cache will
be populated with the data you're using and you won't experience cache misses
nor flushes between each session which bounces from one CPU to the other one.

It can *possibly* make sense to move nginx instances into their own containers
to the CPU not connected to the NIC, but I honnestly don't know, that's what
you'll have to experiment with.

Regards,
Willy

Re: [haproxy]: Performance of haproxy-to-4-nginx vs direct-to-nginx

Reply via email to