> On Mar 27, 2017, at 02:21, Emeric Brun <[email protected]> wrote:
> Intel's guys told me that the bug is related to prf and asked me to recompile
> the engine using '--disable_qat_prf'. Doing that i can do some tests iwth the
> qat engine but i'm facing stability issues:
>
> [root@centos haproxy]# /usr/local/ssl/bin/openssl speed -engine qat -elapsed
> -async_jobs 8 rsa2048
> [WARNING][e_qat.c:1531:bind_qat()] QAT Warnings enabled.
> engine "qat" set.
> You have chosen to measure elapsed time instead of user CPU time.
> Doing 2048 bit private rsa's for 10s: 13442 2048 bit private RSA's in 10.01s
> Doing 2048 bit public rsa's for 10s: 290503 2048 bit public RSA's in 10.00s
> OpenSSL 1.1.0e 16 Feb 2017
> built on: reproducible build, date unspecified
> options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr)
> compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS
> -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2
> -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM
> -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM
> -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM
> -DOPENSSLDIR="\"/usr/local/ssl/ssl\""
> -DENGINESDIR="\"/usr/local/ssl/lib/engines-1.1\"" -Wa,--noexecstack
> sign verify sign/s verify/s
> rsa 2048 bits 0.000745s 0.000034s 1342.9 29050.3
Hmm, the numbers are less than what I've seen:
gzhang@cache:~$ sudo /tmp/openssl_1.1.0_install/bin/openssl speed -engine qat
-elapsed -async_jobs 8 rsa2048
engine "qat" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing 2048 bit private rsa's for 10s: 102435 2048 bit private RSA's in 10.00s
Doing 2048 bit public rsa's for 10s: 498989 2048 bit public RSA's in 10.00s
OpenSSL 1.1.0b 26 Sep 2016
built on: reproducible build, date unspecified
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS
-DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2
-DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM
-DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM
-DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DENGINE_CONF_DEBUG
-DOPENSSLDIR="\"/tmp/openssl_1.1.0_install\""
-DENGINESDIR="\"/tmp/openssl_1.1.0_install/lib/engines-1.1\"" -Wa,--noexecstack
sign verify sign/s verify/s
rsa 2048 bits 0.000098s 0.000020s 10243.5 49898.9
With -async_jobs set to 72, I am able to get:
sign verify sign/s verify/s
rsa 2048 bits 0.000025s 0.000006s 40496.4 170126.1
What kind of qat device do you have? Mine is a 8955(coleto creek).
> Doing a benchmark using haproxy and qat engine stall to ~450 connections/sec
At one point during my test I was getting around 500 connections/sec with qat
engine. IIRC the issue was fixed when I upped the # of open files (ulimit -n)
on the test client side. Probably your issue is not the same though.
>
> Stopping the injection, the haproxy process continue to steal cpu doing
> nothing (top shows ~50% of one core, mainly in user):
Hmm, an idle haproxy process with qat enabled consumes about 5% of a core in my
test. 50% is too much:-(.
>
> here thre trace:
>
> [root@centos ~]# strace -p 27085
> Process 27085 attached
> epoll_wait(3, {}, 200, 1000) = 0
> epoll_wait(3, {}, 200, 1000) = 0
> epoll_wait(3, {}, 200, 1000) = 0
> epoll_wait(3, {}, 200, 1000) = 0
> epoll_wait(3, {}, 200, 1000) = 0
>
> The epoll awake all seconds, seems normal.
>
> If i continue to inject re-using the same key (session resuming,no rsa
> computation), i observe ~1500 connections/src
>
> But stopping the injection the process steal 156% of cpu doing nothing ( core
> 1 20% in user and 80% in system, and core 2 76% in user):
>
> Here the trace:
> epoll_wait(3, {{EPOLLIN|EPOLLRDHUP, {u32=57, u64=57}}, {EPOLLIN|EPOLLRDHUP,
> {u32=56, u64=56}}, {EPOLLIN|EPOLLRDHUP, {u32=55, u64=55}},
> {EPOLLIN|EPOLLRDHUP, {u32=54, u64=54}}, {EPOLLIN|EPOLLRDHUP, {u32=53,
> u64=53}}, {EPOLLIN|EPOLLRDHUP, {u32=52, u64=52}}, {EPOLLIN|EPOLLRDHUP,
> {u32=51, u64=51}}, {EPOLLIN|EPOLLRDHUP, {u32=50, u64=50}},
> {EPOLLIN|EPOLLRDHUP, {u32=49, u64=49}}, {EPOLLIN|EPOLLRDHUP, {u32=48,
> u64=48}}, {EPOLLIN|EPOLLRDHUP, {u32=47, u64=47}}, {EPOLLIN|EPOLLRDHUP,
> {u32=45, u64=45}}, {EPOLLIN|EPOLLRDHUP, {u32=42, u64=42}},
> {EPOLLIN|EPOLLRDHUP, {u32=44, u64=44}}, {EPOLLIN|EPOLLRDHUP, {u32=43,
> u64=43}}}, 200, 1000) = 15
> epoll_wait(3, {{EPOLLIN|EPOLLRDHUP, {u32=57, u64=57}}, {EPOLLIN|EPOLLRDHUP,
> {u32=56, u64=56}}, {EPOLLIN|EPOLLRDHUP, {u32=55, u64=55}},
> {EPOLLIN|EPOLLRDHUP, {u32=54, u64=54}}, {EPOLLIN|EPOLLRDHUP, {u32=53,
> u64=53}}, {EPOLLIN|EPOLLRDHUP, {u32=52, u64=52}}, {EPOLLIN|EPOLLRDHUP,
> {u32=51, u64=51}}, {EPOLLIN|EPOLLRDHUP, {u32=50, u64=50}},
> {EPOLLIN|EPOLLRDHUP, {u32=49, u64=49}}, {EPOLLIN|EPOLLRDHUP, {u32=48,
> u64=48}}, {EPOLLIN|EPOLLRDHUP, {u32=47, u64=47}}, {EPOLLIN|EPOLLRDHUP,
> {u32=45, u64=45}}, {EPOLLIN|EPOLLRDHUP, {u32=42, u64=42}},
> {EPOLLIN|EPOLLRDHUP, {u32=44, u64=44}}, {EPOLLIN|EPOLLRDHUP, {u32=43,
> u64=43}}}, 200, 1000) = 15
> epoll_wait(3, {{EPOLLIN|EPOLLRDHUP, {u32=57, u64=57}}, {EPOLLIN|EPOLLRDHUP,
> {u32=56, u64=56}}, {EPOLLIN|EPOLLRDHUP, {u32=55, u64=55}},
> {EPOLLIN|EPOLLRDHUP, {u32=54, u64=54}}, {EPOLLIN|EPOLLRDHUP, {u32=53,
> u64=53}}, {EPOLLIN|EPOLLRDHUP, {u32=52, u64=52}}, {EPOLLIN|EPOLLRDHUP,
> {u32=51, u64=51}}, {EPOLLIN|EPOLLRDHUP, {u32=50, u64=50}},
> {EPOLLIN|EPOLLRDHUP, {u32=49, u64=49}}, {EPOLLIN|EPOLLRDHUP, {u32=48,
> u64=48}}, {EPOLLIN|EPOLLRDHUP, {u32=47, u64=47}}, {EPOLLIN|EPOLLRDHUP,
> {u32=45, u64=45}}, {EPOLLIN|EPOLLRDHUP, {u32=42, u64=42}},
> {EPOLLIN|EPOLLRDHUP, {u32=44, u64=44}}, {EPOLLIN|EPOLLRDHUP, {u32=43,
> u64=43}}}, 200, 1000) = 15
> epoll_wait(3, {{EPOLLIN|EPOLLRDHUP, {u32=57, u64=57}}, {EPOLLIN|EPOLLRDHUP,
> {u32=56, u64=56}}, {EPOLLIN|EPOLLRDHUP, {u32=55, u64=55}},
> {EPOLLIN|EPOLLRDHUP, {u32=54, u64=54}}, {EPOLLIN|EPOLLRDHUP, {u32=53,
> u64=53}}, {EPOLLIN|EPOLLRDHUP, {u32=52, u64=52}}, {EPOLLIN|EPOLLRDHUP,
> {u32=51, u64=51}}, {EPOLLIN|EPOLLRDHUP, {u32=50, u64=50}},
> {EPOLLIN|EPOLLRDHUP, {u32=49, u64=49}}, {EPOLLIN|EPOLLRDHUP, {u32=48,
> u64=48}}, {EPOLLIN|EPOLLRDHUP, {u32=47, u64=47}}, {EPOLLIN|EPOLLRDHUP,
> {u32=45, u64=45}}, {EPOLLIN|EPOLLRDHUP, {u32=42, u64=42}},
> {EPOLLIN|EPOLLRDHUP, {u32=44, u64=44}}, {EPOLLIN|EPOLLRDHUP, {u32=43,
> u64=43}}}, 200, 1000) = 15
>
> epoll_wait awake in very fast loop.
>
> When this point is reached, some of time, re-starting the injection will
> crash haproxy in segfault.
>
> Here my haproxy's config:
> global
> tune.ssl.default-dh-param 2048
> ssl-engine qat
> ssl-async
>
> listen gg
> mode http
Hmm, I have been testing with tcp mode (which is what we use in prod env). Any
chance you have a test with tcp mode and see if you could reproduce the same
issue?
Thanks,
Grant