Hi,
again a performance topic.
I did some further testing/benchmarks with ECC and nbproc >1. I was
testing on a "E5-2697 v4" and the first thing I noticed was that HAProxy
has a fixed limit of 64 for nbproc. So the setup:
HAProxy server with the mentioned E5:
global
user haproxy
group haproxy
maxconn 75000
log 127.0.0.2 local0
ssl-default-bind-ciphers
ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDH
ssl-default-bind-options no-sslv3 no-tls-tickets
tune.ssl.default-dh-param 1024
nbproc 64
defaults
timeout client 300s
timeout server 300s
timeout queue 60s
timeout connect 7s
timeout http-request 10s
maxconn 75000
bind-process 1
# HTTP
frontend haproxy_test_http
bind :65410
mode http
option httplog
option httpclose
log global
default_backend bk_ram
# ECC
frontend haproxy_test-ECC
bind-process 3-64
bind :65420 ssl crt /etc/haproxy/test.pem-ECC
mode http
option httplog
option httpclose
log global
default_backend bk_ram
backend bk_ram
mode http
fullconn 75000 # Just in case the lower default limit will be
reached...
errorfile 503 /etc/haproxy/test.error
/etc/haproxy/test.error:
HTTP/1.0 200
Cache-Control: no-cache
Connection: close
Content-Type: text/plain
Test123456
The ECC key:
openssl ecparam -genkey -name prime256v1 -out
/etc/haproxy/test.pem-ECC.key
openssl req -new -sha256 -key /etc/haproxy/test.pem-ECC.key -days 365
-nodes -x509 -sha256 -subj "/O=ECC Test/CN=test.example.com" -out
/etc/haproxy/test.pem-ECC.crt
cat /etc/haproxy/test.pem-ECC.key /etc/haproxy/test.pem-ECC.crt >
/etc/haproxy/test.pem-ECC
So then I tried a local "ab":
ab -n 5000 -c 250 https://127.0.0.1:65420/
Server Hostname: 127.0.0.1
Server Port: 65420
SSL/TLS Protocol:
TLSv1/SSLv3,ECDHE-ECDSA-AES128-GCM-SHA256,256,128
Document Path: /
Document Length: 107 bytes
Concurrency Level: 250
Time taken for tests: 3.940 seconds
Complete requests: 5000
Failed requests: 0
Write errors: 0
Non-2xx responses: 5000
Total transferred: 1060000 bytes
HTML transferred: 535000 bytes
Requests per second: 1268.95 [#/sec] (mean)
Time per request: 197.013 [ms] (mean)
Time per request: 0.788 [ms] (mean, across all concurrent
requests)
Transfer rate: 262.71 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 54 138 34.7 162 193
Processing: 8 51 34.8 24 157
Waiting: 3 40 31.6 18 113
Total: 177 189 7.5 188 333
Percentage of the requests served within a certain time (ms)
50% 188
66% 189
75% 190
80% 190
90% 191
95% 192
98% 196
99% 205
100% 333 (longest request)
The same test with just nbproc 1 was about ~1500 requests/s. So 1,5k *
nbproc would have been what I expected, at least somewhere near that
value.
Then I setup 61 EC2 instances, standard setup t2-micro. They're somewhat
slower with ~1k ECC requests per second but that's ok for the test.
HTTP (one proc) via localhost was around 27-28k r/s, remote (EC2) ~4500.
So then I started "ab" parallel from each and it was going down to about
~4xx requests/s for ECC on each node which is far below the ~1500
(single proc) or ~1300 (multi proc) which is much more than I expected
tbh. I thought it would scale much better for up to nbproc and getting
worse when >nbproc. I did some basic checks to figure out the
reason/bottleneck and to me it looks like a lot of switch/epoll_wait. In
(h)top it shows that ksoftirqd + one haproxy proc is burning 100% cpu of
a single core, it's not distributed above multiple cores. I'm not sure
yet whether it's related to the SSL part, HAProxy or some Kernel foo.
HTTP performs better. ~27k total on localhost, ~5400 single ab via EC2
and still ~2100 per EC2 with a total of 15 instances - and the HTTP proc
is just a single proc!
So I wonder what's the reason for the single blocking core. Is that the
reason for the rather poor performance because it has an impact on any
of those processes? Can we distribute that onto multiple cores/process
as well? Any ideas?
Oh, and I was using 1.6.5:
HA-Proxy version 1.6.5 2016/05/10
Copyright 2000-2016 Willy Tarreau <[email protected]>
Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
OPTIONS = USE_LIBCRYPT=1 USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1
Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents =
200
Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.7
Compression algorithms supported : identity("identity"),
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with OpenSSL version : OpenSSL 1.0.1e 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1t 3 May 2016 (VERSIONS
DIFFER!)
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.30 2012-02-04
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_TRANSPARENT
IPV6_TRANSPARENT IP_FREEBIND
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
I actually thought I was using 1.6.9 on that host already so I just
upgraded and tried again some benchmarks but it looks like it's almost
equal at the first glance.
--
Regards,
Christian Ruppert