Re: [haproxy]: Performance of haproxy-to-4-nginx vs direct-to-nginx

Krishna Kumar (Engineering) Wed, 29 Apr 2015 03:57:44 -0700

Dear all,

Sorry, my lab systems were down for many days and I could not get back on
this earlier. After
new systems were allocated, I managed to get all the requested information
with a fresh ru
(Sorry, this is a long mail too!). There are now 4 physical servers,
running Debian 3.2.0-4-amd64,
connected directly to a common switch:


    server1: Run 'ab' in a container, no cpu/memory restriction.
    server2: Run haproxy in a container, configured with 4 nginx's,
cpu/memory configured as
                  shown below.
    server3: Run 2 different nginx containers, no cpu/mem restriction.
    server4: Run 2 different nginx containers, for a total of 4 nginx, no
cpu/mem restriction.

The servers have 2 sockets, each with 24 cores. Socket 0 has cores
0,2,4,..,46 and Socket 1 has
cores 1,3,5,..,47. The NIC (ixgbe) is bound to CPU 0. Haproxy is started on
cpu's:
2,4,6,8,10,12,14,16, so that is in the same cache line as the nic (nginx is
run on different servers
as explained above). No tuning on nginx servers as the comparison is between
'ab' -> 'nginx' and 'ab' and 'haproxy' -> nginx(s). The cpus are "Intel(R)
Xeon(R) CPU E5-2670 v3
@ 2.30GHz". The containers are all configured with 8GB, server having 128GB
memory.

mpstat and iostat were captured during the test, where the capture started
after 'ab' started and
capture ended just before 'ab' finished so as to get "warm" numbers.

------------------------------------------------------------------------------------------------------------------------
Request directly to 1 nginx backend server, size=256 bytes:

Command: ab -k -n 100000 -c 1000 <nginx>:80/256
    Requests per second:    69749.02 [#/sec] (mean)
    Transfer rate:          34600.18 [Kbytes/sec] received
------------------------------------------------------------------------------------------------------------------------
Request to haproxy configured with 4 nginx backends (nbproc=4), size=256
bytes:

Command: ab -k -n 100000 -c 1000 <haproxy>:80/256
    Requests per second:    19071.55 [#/sec] (mean)
    Transfer rate:          9461.28 [Kbytes/sec] received

        mpstat (first 4 processors only, rest are almost zero):
Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal
%guest  %gnice   %idle
Average:     all    0.44    0.00    1.59    0.00    0.00    2.96    0.00
0.00    0.00   95.01
Average:       0    0.25    0.00    0.75    0.00    0.00   98.01    0.00
0.00    0.00    1.00
Average:       1    1.26    0.00    5.28    0.00    0.00    2.51    0.00
0.00    0.00   90.95
Average:       2    2.76    0.00    8.79    0.00    0.00    5.78    0.00
0.00    0.00   82.66
Average:       3    1.51    0.00    6.78    0.00    0.00    3.02    0.00
0.00    0.00   88.69

                pidstat:
Average:      105       471    5.00   33.50    0.00   38.50     -  haproxy
Average:      105       472    6.50   44.00    0.00   50.50     -  haproxy
Average:      105       473    8.50   40.00    0.00   48.50     -  haproxy
Average:      105       475    2.50   14.00    0.00   16.50     -  haproxy
------------------------------------------------------------------------------------------------------------------------
Request directly to 1 nginx backend server, size=64K

Command: ab -k -n 100000 -c 1000 <nginx>:80/64K
    Requests per second:    3342.56 [#/sec] (mean)
    Transfer rate:          214759.11 [Kbytes/sec] received
------------------------------------------------------------------------------------------------------------------------
Request to haproxy configured with 4 nginx backends (nbproc=4), size=64K

Command: ab -k -n 100000 -c 1000 <haproxy>:80/64K

    Requests per second:    1283.62 [#/sec] (mean)
    Transfer rate:          82472.35 [Kbytes/sec] received

        mpstat (first 4 processors only, rest are almost zero):
Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal
%guest  %gnice   %idle
Average:     all    0.08    0.00    0.74    0.01    0.00    2.62    0.00
0.00    0.00   96.55
Average:       0    0.00    0.00    0.00    0.00    0.00  100.00    0.00
0.00    0.00    0.00
Average:       1    1.03    0.00    9.98    0.21    0.00    7.67    0.00
0.00    0.00   81.10
Average:       2    0.70    0.00    6.32    0.00    0.00    4.50    0.00
0.00    0.00   88.48
Average:       3    0.15    0.00    2.04    0.06    0.00    1.73    0.00
0.00    0.00   96.03

                pidstat:
Average:      UID       PID    %usr %system  %guest    %CPU   CPU  Command
Average:      105       471    0.93   14.70    0.00   15.63     -  haproxy
Average:      105       472    1.12   21.55    0.00   22.67     -  haproxy
Average:      105       473    1.41   20.95    0.00   22.36     -  haproxy
Average:      105       475    0.22    4.85    0.00    5.07     -  haproxy
------------------------------------------------------------------------------
            Build information:

HA-Proxy version 1.5.8 2014/10/31
Copyright 2000-2014 Willy Tarreau <w...@1wt.eu>

Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
  CFLAGS  = -g -O2 -fstack-protector-strong -Wformat
-Werror=format-security -D_FORTIFY_SOURCE=2
  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.8
Compression algorithms supported : identity, deflate, gzip
Built with OpenSSL version : OpenSSL 1.0.1k 8 Jan 2015
Running on OpenSSL version : OpenSSL 1.0.1k 8 Jan 2015
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.35 2014-04-04
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT
IP_FREEBIND

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.
------------------------------------------------------------------------------
        Configuration file:
global
    daemon
    maxconn  60000
    quiet
    nbproc 4
    maxpipes 16384
    user haproxy
    group haproxy
    stats socket /var/run/haproxy.sock mode 600 level admin
    stats timeout 2m

defaults
    option forwardfor
    option http-server-close
    retries 3
    option redispatch
    maxconn 60000
    option splice-auto
    option prefer-last-server
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms

userlist stats-auth
        group admin    users admin
        user  admin    insecure-password admin

frontend www-http
    bind *:80
    reqadd X-Forwarded-Proto:\ http
    default_backend www-backend

backend www-backend
    mode http
    maxconn 60000
    stats enable
    stats uri /stats
    acl AUTH    http_auth(stats-auth)
    acl AUTH_ADMIN    http_auth(stats-auth) admin
    stats http-request auth unless AUTH
    balance roundrobin
    option prefer-last-server
    option splice-auto
    server nginx-1 192.168.122.200:80 maxconn 15000 check
    server nginx-2 192.168.122.139:80 maxconn 15000 check
    server nginx-3 192.168.122.59:80 maxconn 15000 check
    server nginx-4 192.168.122.60:80 maxconn 15000 check
------------------------------------------------------------------------------

The Requests/sec and transfer rate are both about 4 times worse. Any clue
what I could do to get
similar numbers in haproxy as compared to directly using the backend?

BTW, I am assuming that haproxy cannot give better numbers than single
system even though
haproxy can be served by multiple backends, since subsequent packets of a
connection will go to
the same server - the intent is to just load balance.

Thank you for the help once again,

Regards,
- Krishna Kumar


On Wed, Apr 1, 2015 at 5:04 PM, Krishna Kumar Unnikrishnan (Engineering) <
krishna...@flipkart.com> wrote:

> (Resending, as I unintentionally dropped the ML earlier).
>
> Hi Pavlos and Willy,
>
> Thank you very much for your responses. I was silent since my development
> systems
> were shut down for re-racking, and I had no chance to get the results to
> Pavlos's
> suggestions.
>
> Thanks Willy, I will make your suggested changes (misguided - as I am very
> new to
> haproxy), and report on the findings, once my system are back online,
> hopefully tmrw.
>
> (With the move to Lua, hope to see even "better" performance numbers!).
>
> Regards,
> - Krishna Kumar
>
>
>
> On Wed, Apr 1, 2015 at 3:27 PM, Willy Tarreau <w...@1wt.eu> wrote:
>
>> Hi,
>>
>> On Mon, Mar 30, 2015 at 10:43:51AM +0530, Krishna Kumar Unnikrishnan
>> (Engineering) wrote:
>> > Hi all,
>> >
>> > I am testing haproxy as follows:
>> >
>> > System1: 24 Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, 64 GB. This
>> system
>> >     is running 3.19.0 kernel, and hosts the following servers:
>> >         1. nginx1 server - cpu 1-2, 1G memory, runs as a Linux
>> >             container using cpuset.cpus feature.
>> >         2. nginx2 server - cpu 3-4, 1G memory, runs via LXC.
>> >         3. nginx3 server - cpu 5-6, 1G memory, runs via LXC.
>> >         4. nginx4 server - cpu 7-8, 1G memory, runs via LXC.
>> >         5. haproxy - cpu 9-10, 1G memory runs via LXC. Runs haproxy
>> >             ver 1.5.8: configured with above 4 container's ip
>> >             addresses as the backend.
>> >
>> > System2: 56 Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, 128 GB. This
>> system
>> >     is running 3.19.0, and run's 'ab' either to the haproxy node, or
>> >     directly to an nginx container. System1 & System2 are locally
>> >     connected via a switch with Intel 10G cards.
>> >
>> > With very small packets of 64 bytes, I am getting the following results:
>> >
>> > A. ab -n 100000 -c 4096 http://nginx1:80/64
>> > -----------------------------------------
>> >
>> > Concurrency Level:      4096
>> > Time taken for tests:   3.232 seconds
>> > Complete requests:      100000
>> > Failed requests:        0
>> > Total transferred:      28800000 bytes
>> > HTML transferred:       6400000 bytes
>> > Requests per second:    30943.26 [#/sec] (mean)
>> > Time per request:       132.371 [ms] (mean)
>> > Time per request:       0.032 [ms] (mean, across all concurrent
>> requests)
>> > Transfer rate:          8702.79 [Kbytes/sec] received
>> >
>> > Connection Times (ms)
>> >               min  mean[+/-sd] median   max
>> > Connect:        9   65 137.4     45    1050
>> > Processing:     4   52  25.3     51     241
>> > Waiting:        3   37  19.2     35     234
>> > Total:         16  117 146.1    111    1142
>> >
>> > Percentage of the requests served within a certain time (ms)
>> >   50%    111     66%    119     75%    122
>> >   80%    124     90%    133     95%    215
>> >   98%    254     99%   1126     100%   1142 (longest request)
>> >
>> > B. ab -n 100000 -c 4096 http://haproxy:80/64
>> > ----------------------------------------------
>> >
>> > Concurrency Level:      4096
>> > Time taken for tests:   5.503 seconds
>> > Complete requests:      100000
>> > Failed requests:        0
>> > Total transferred:      28800000 bytes
>> > HTML transferred:       6400000 bytes
>> > Requests per second:    18172.96 [#/sec] (mean)
>> > Time per request:       225.390 [ms] (mean)
>> > Time per request:       0.055 [ms] (mean, across all concurrent
>> requests)
>> > Transfer rate:          5111.15 [Kbytes/sec] received
>> >
>> > Connection Times (ms)
>> >               min  mean[+/-sd] median   max
>> > Connect:        0  134 358.3     23    3033
>> > Processing:     2   61  47.7     51     700
>> > Waiting:        2   50  43.0     42     685
>> > Total:          7  194 366.7     79    3122
>> >
>> > Percentage of the requests served within a certain time (ms)
>> >   50%     79     66%    105     75%    134
>> >   80%    159     90%    318     95%   1076
>> >   98%   1140     99%   1240     100%   3122 (longest request)
>> >
>> > I expected haproxy to deliver better results with multiple connections,
>> > since
>> > haproxy will round-robin between the 4 servers. I have done no tuning,
>> and
>> > have
>> > used the config file at the end of this mail. With 256K file size, the
>> times
>> > are slightly better for haproxy vs nginx. I notice that %requests
>> served is
>> > similar for both cases till about 90%.
>>
>> I'm seeing a very simple and common explanation to this. You're stressing
>> the TCP stack and it becomes the bottleneck. Both haproxy and nginx make
>> very little use of userland and spend most of their time in the kernel,
>> so by putting both of them on the same system image, you're still subject
>> to the session table lookups, locking and whatever limits the processing.
>> And in fact, by adding haproxy in front of nginx on the same system, you
>> have effectively double the kernel's job, and you're measuring about half
>> of the performance, so there's nothing much surprizing here.
>>
>> Please check the CPU usage as Pavlos mentionned. I'm guessing that your
>> system is spending most of its time in system and/or softirq. Also, maybe
>> you have conntrack enabled on the system. In this case, having the
>> components
>> on the machine will triple the conntrack session rate, effectively
>> increasing
>> its work.
>>
>> There's something you can try in your config below to see if the
>> connection rate is mostly responsible for the trouble :
>>
>> > global
>> >     maxconn  65536
>> >         ulimit-n 65536
>>
>> Please remove ulimit-n BTW, it's wrong and not needed.
>>
>> >     daemon
>> >     quiet
>> >     nbproc 2
>> >         user haproxy
>> >         group haproxy
>> >
>> > defaults
>> >         #log     global
>> >         mode    http
>> >         option  dontlognull
>> >         retries 3
>> >         option redispatch
>> >         maxconn 65536
>> >         timeout connect     5000
>> >         timeout client      50000
>> >         timeout server      50000
>> >
>> > listen my_ha_proxy 192.168.1.110:80
>> >        mode http
>> >        stats enable
>> >        stats auth someuser:somepassword
>> >        balance roundrobin
>> >        cookie JSESSIONID prefix
>> >        option httpclose
>>
>> Here, please remove "option httpclose" and use "option prefer-last-server"
>> instead. Then comment out the cookie line for the time of a test, it will
>> try to optimize the reuse of the connections. You also need to run "ab"
>> with "-k" to enable keep-alive. If you notice an important performance
>> boost, it means that the connection setup/teardown is expensive and then
>> that it might be responsible for your slowdown.
>>
>> Another important point is the CPU pinning. You're using a 2xxx CPU which
>> is a dual-socket one. So I don't know how your logical CPUs are spread
>> over
>> physical ones, but in general for clear HTTP, you definitely want to
>> remove
>> any inter-CPU communications, meaning that nginx, haproxy and the NIC have
>> to be attached to physical cores of the same CPU socket. Please pick the
>> socket the NIC is physically attached to so that the network traffic
>> doesn't
>> have to pass via the QPI link. That way you'll ensure that the L3 cache
>> will
>> be populated with the data you're using and you won't experience cache
>> misses
>> nor flushes between each session which bounces from one CPU to the other
>> one.
>>
>> It can *possibly* make sense to move nginx instances into their own
>> containers
>> to the CPU not connected to the NIC, but I honnestly don't know, that's
>> what
>> you'll have to experiment with.
>>
>> Regards,
>> Willy
>>
>>
>
>

Re: [haproxy]: Performance of haproxy-to-4-nginx vs direct-to-nginx

Reply via email to