Fwd: [haproxy]: Performance of haproxy-to-4-nginx vs direct-to-nginx

Krishna Kumar Unnikrishnan (Engineering) Wed, 01 Apr 2015 04:35:43 -0700

(Resending, as I unintentionally dropped the ML earlier).

Hi Pavlos and Willy,


Thank you very much for your responses. I was silent since my development
systems
were shut down for re-racking, and I had no chance to get the results to
Pavlos's
suggestions.

Thanks Willy, I will make your suggested changes (misguided - as I am very
new to
haproxy), and report on the findings, once my system are back online,
hopefully tmrw.

(With the move to Lua, hope to see even "better" performance numbers!).

Regards,
- Krishna Kumar



On Wed, Apr 1, 2015 at 3:27 PM, Willy Tarreau <[email protected]> wrote:

> Hi,
>
> On Mon, Mar 30, 2015 at 10:43:51AM +0530, Krishna Kumar Unnikrishnan
> (Engineering) wrote:
> > Hi all,
> >
> > I am testing haproxy as follows:
> >
> > System1: 24 Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, 64 GB. This system
> >     is running 3.19.0 kernel, and hosts the following servers:
> >         1. nginx1 server - cpu 1-2, 1G memory, runs as a Linux
> >             container using cpuset.cpus feature.
> >         2. nginx2 server - cpu 3-4, 1G memory, runs via LXC.
> >         3. nginx3 server - cpu 5-6, 1G memory, runs via LXC.
> >         4. nginx4 server - cpu 7-8, 1G memory, runs via LXC.
> >         5. haproxy - cpu 9-10, 1G memory runs via LXC. Runs haproxy
> >             ver 1.5.8: configured with above 4 container's ip
> >             addresses as the backend.
> >
> > System2: 56 Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, 128 GB. This
> system
> >     is running 3.19.0, and run's 'ab' either to the haproxy node, or
> >     directly to an nginx container. System1 & System2 are locally
> >     connected via a switch with Intel 10G cards.
> >
> > With very small packets of 64 bytes, I am getting the following results:
> >
> > A. ab -n 100000 -c 4096 http://nginx1:80/64
> > -----------------------------------------
> >
> > Concurrency Level:      4096
> > Time taken for tests:   3.232 seconds
> > Complete requests:      100000
> > Failed requests:        0
> > Total transferred:      28800000 bytes
> > HTML transferred:       6400000 bytes
> > Requests per second:    30943.26 [#/sec] (mean)
> > Time per request:       132.371 [ms] (mean)
> > Time per request:       0.032 [ms] (mean, across all concurrent requests)
> > Transfer rate:          8702.79 [Kbytes/sec] received
> >
> > Connection Times (ms)
> >               min  mean[+/-sd] median   max
> > Connect:        9   65 137.4     45    1050
> > Processing:     4   52  25.3     51     241
> > Waiting:        3   37  19.2     35     234
> > Total:         16  117 146.1    111    1142
> >
> > Percentage of the requests served within a certain time (ms)
> >   50%    111     66%    119     75%    122
> >   80%    124     90%    133     95%    215
> >   98%    254     99%   1126     100%   1142 (longest request)
> >
> > B. ab -n 100000 -c 4096 http://haproxy:80/64
> > ----------------------------------------------
> >
> > Concurrency Level:      4096
> > Time taken for tests:   5.503 seconds
> > Complete requests:      100000
> > Failed requests:        0
> > Total transferred:      28800000 bytes
> > HTML transferred:       6400000 bytes
> > Requests per second:    18172.96 [#/sec] (mean)
> > Time per request:       225.390 [ms] (mean)
> > Time per request:       0.055 [ms] (mean, across all concurrent requests)
> > Transfer rate:          5111.15 [Kbytes/sec] received
> >
> > Connection Times (ms)
> >               min  mean[+/-sd] median   max
> > Connect:        0  134 358.3     23    3033
> > Processing:     2   61  47.7     51     700
> > Waiting:        2   50  43.0     42     685
> > Total:          7  194 366.7     79    3122
> >
> > Percentage of the requests served within a certain time (ms)
> >   50%     79     66%    105     75%    134
> >   80%    159     90%    318     95%   1076
> >   98%   1140     99%   1240     100%   3122 (longest request)
> >
> > I expected haproxy to deliver better results with multiple connections,
> > since
> > haproxy will round-robin between the 4 servers. I have done no tuning,
> and
> > have
> > used the config file at the end of this mail. With 256K file size, the
> times
> > are slightly better for haproxy vs nginx. I notice that %requests served
> is
> > similar for both cases till about 90%.
>
> I'm seeing a very simple and common explanation to this. You're stressing
> the TCP stack and it becomes the bottleneck. Both haproxy and nginx make
> very little use of userland and spend most of their time in the kernel,
> so by putting both of them on the same system image, you're still subject
> to the session table lookups, locking and whatever limits the processing.
> And in fact, by adding haproxy in front of nginx on the same system, you
> have effectively double the kernel's job, and you're measuring about half
> of the performance, so there's nothing much surprizing here.
>
> Please check the CPU usage as Pavlos mentionned. I'm guessing that your
> system is spending most of its time in system and/or softirq. Also, maybe
> you have conntrack enabled on the system. In this case, having the
> components
> on the machine will triple the conntrack session rate, effectively
> increasing
> its work.
>
> There's something you can try in your config below to see if the
> connection rate is mostly responsible for the trouble :
>
> > global
> >     maxconn  65536
> >         ulimit-n 65536
>
> Please remove ulimit-n BTW, it's wrong and not needed.
>
> >     daemon
> >     quiet
> >     nbproc 2
> >         user haproxy
> >         group haproxy
> >
> > defaults
> >         #log     global
> >         mode    http
> >         option  dontlognull
> >         retries 3
> >         option redispatch
> >         maxconn 65536
> >         timeout connect     5000
> >         timeout client      50000
> >         timeout server      50000
> >
> > listen my_ha_proxy 192.168.1.110:80
> >        mode http
> >        stats enable
> >        stats auth someuser:somepassword
> >        balance roundrobin
> >        cookie JSESSIONID prefix
> >        option httpclose
>
> Here, please remove "option httpclose" and use "option prefer-last-server"
> instead. Then comment out the cookie line for the time of a test, it will
> try to optimize the reuse of the connections. You also need to run "ab"
> with "-k" to enable keep-alive. If you notice an important performance
> boost, it means that the connection setup/teardown is expensive and then
> that it might be responsible for your slowdown.
>
> Another important point is the CPU pinning. You're using a 2xxx CPU which
> is a dual-socket one. So I don't know how your logical CPUs are spread over
> physical ones, but in general for clear HTTP, you definitely want to remove
> any inter-CPU communications, meaning that nginx, haproxy and the NIC have
> to be attached to physical cores of the same CPU socket. Please pick the
> socket the NIC is physically attached to so that the network traffic
> doesn't
> have to pass via the QPI link. That way you'll ensure that the L3 cache
> will
> be populated with the data you're using and you won't experience cache
> misses
> nor flushes between each session which bounces from one CPU to the other
> one.
>
> It can *possibly* make sense to move nginx instances into their own
> containers
> to the CPU not connected to the NIC, but I honnestly don't know, that's
> what
> you'll have to experiment with.
>
> Regards,
> Willy
>
>

Fwd: [haproxy]: Performance of haproxy-to-4-nginx vs direct-to-nginx

Reply via email to