> On Mar 27, 2017, at 02:21, Emeric Brun <[email protected]> wrote:
> Intel's guys told me that the bug is related to prf and asked me to recompile 
> the engine using '--disable_qat_prf'. Doing that i can do some tests iwth the 
> qat engine but i'm facing stability issues:
> 
> [root@centos haproxy]# /usr/local/ssl/bin/openssl speed -engine qat -elapsed 
> -async_jobs 8 rsa2048
> [WARNING][e_qat.c:1531:bind_qat()] QAT Warnings enabled.
> engine "qat" set.
> You have chosen to measure elapsed time instead of user CPU time.
> Doing 2048 bit private rsa's for 10s: 13442 2048 bit private RSA's in 10.01s
> Doing 2048 bit public rsa's for 10s: 290503 2048 bit public RSA's in 10.00s
> OpenSSL 1.1.0e  16 Feb 2017
> built on: reproducible build, date unspecified
> options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
> compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS 
> -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 
> -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM 
> -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM 
> -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM 
> -DOPENSSLDIR="\"/usr/local/ssl/ssl\"" 
> -DENGINESDIR="\"/usr/local/ssl/lib/engines-1.1\""  -Wa,--noexecstack
>                  sign    verify    sign/s verify/s
> rsa 2048 bits 0.000745s 0.000034s   1342.9  29050.3

Hmm, the numbers are less than what I've seen:

gzhang@cache:~$ sudo /tmp/openssl_1.1.0_install/bin/openssl speed -engine qat 
-elapsed -async_jobs 8 rsa2048
engine "qat" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing 2048 bit private rsa's for 10s: 102435 2048 bit private RSA's in 10.00s
Doing 2048 bit public rsa's for 10s: 498989 2048 bit public RSA's in 10.00s
OpenSSL 1.1.0b  26 Sep 2016
built on: reproducible build, date unspecified
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS 
-DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 
-DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM 
-DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM 
-DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DENGINE_CONF_DEBUG 
-DOPENSSLDIR="\"/tmp/openssl_1.1.0_install\"" 
-DENGINESDIR="\"/tmp/openssl_1.1.0_install/lib/engines-1.1\""  -Wa,--noexecstack
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.000098s 0.000020s  10243.5  49898.9

With -async_jobs set to 72, I am able to get:
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.000025s 0.000006s  40496.4 170126.1

What kind of qat device do you have? Mine is a 8955(coleto creek).

> Doing a benchmark using haproxy and qat engine stall to ~450 connections/sec

At one point during my test I was getting around 500 connections/sec with qat 
engine. IIRC the issue was fixed when I upped the # of open files (ulimit -n) 
on the test client side. Probably your issue is not the same though.

> 
> Stopping the injection, the haproxy process continue to steal cpu doing 
> nothing (top shows ~50% of one core, mainly in user):
Hmm, an idle haproxy process with qat enabled consumes about 5% of a core in my 
test. 50% is too much:-(.

> 
> here thre trace:
> 
> [root@centos ~]# strace -p 27085
> Process 27085 attached
> epoll_wait(3, {}, 200, 1000)            = 0
> epoll_wait(3, {}, 200, 1000)            = 0
> epoll_wait(3, {}, 200, 1000)            = 0
> epoll_wait(3, {}, 200, 1000)            = 0
> epoll_wait(3, {}, 200, 1000)            = 0
> 
> The epoll awake all seconds, seems normal.
> 
> If i continue to inject re-using the same key (session resuming,no rsa 
> computation), i observe ~1500 connections/src
> 
> But stopping the injection the process steal 156% of cpu doing nothing ( core 
> 1 20% in user and 80% in system, and core 2 76% in user):
> 
> Here the trace:
> epoll_wait(3, {{EPOLLIN|EPOLLRDHUP, {u32=57, u64=57}}, {EPOLLIN|EPOLLRDHUP, 
> {u32=56, u64=56}}, {EPOLLIN|EPOLLRDHUP, {u32=55, u64=55}}, 
> {EPOLLIN|EPOLLRDHUP, {u32=54, u64=54}}, {EPOLLIN|EPOLLRDHUP, {u32=53, 
> u64=53}}, {EPOLLIN|EPOLLRDHUP, {u32=52, u64=52}}, {EPOLLIN|EPOLLRDHUP, 
> {u32=51, u64=51}}, {EPOLLIN|EPOLLRDHUP, {u32=50, u64=50}}, 
> {EPOLLIN|EPOLLRDHUP, {u32=49, u64=49}}, {EPOLLIN|EPOLLRDHUP, {u32=48, 
> u64=48}}, {EPOLLIN|EPOLLRDHUP, {u32=47, u64=47}}, {EPOLLIN|EPOLLRDHUP, 
> {u32=45, u64=45}}, {EPOLLIN|EPOLLRDHUP, {u32=42, u64=42}}, 
> {EPOLLIN|EPOLLRDHUP, {u32=44, u64=44}}, {EPOLLIN|EPOLLRDHUP, {u32=43, 
> u64=43}}}, 200, 1000) = 15
> epoll_wait(3, {{EPOLLIN|EPOLLRDHUP, {u32=57, u64=57}}, {EPOLLIN|EPOLLRDHUP, 
> {u32=56, u64=56}}, {EPOLLIN|EPOLLRDHUP, {u32=55, u64=55}}, 
> {EPOLLIN|EPOLLRDHUP, {u32=54, u64=54}}, {EPOLLIN|EPOLLRDHUP, {u32=53, 
> u64=53}}, {EPOLLIN|EPOLLRDHUP, {u32=52, u64=52}}, {EPOLLIN|EPOLLRDHUP, 
> {u32=51, u64=51}}, {EPOLLIN|EPOLLRDHUP, {u32=50, u64=50}}, 
> {EPOLLIN|EPOLLRDHUP, {u32=49, u64=49}}, {EPOLLIN|EPOLLRDHUP, {u32=48, 
> u64=48}}, {EPOLLIN|EPOLLRDHUP, {u32=47, u64=47}}, {EPOLLIN|EPOLLRDHUP, 
> {u32=45, u64=45}}, {EPOLLIN|EPOLLRDHUP, {u32=42, u64=42}}, 
> {EPOLLIN|EPOLLRDHUP, {u32=44, u64=44}}, {EPOLLIN|EPOLLRDHUP, {u32=43, 
> u64=43}}}, 200, 1000) = 15
> epoll_wait(3, {{EPOLLIN|EPOLLRDHUP, {u32=57, u64=57}}, {EPOLLIN|EPOLLRDHUP, 
> {u32=56, u64=56}}, {EPOLLIN|EPOLLRDHUP, {u32=55, u64=55}}, 
> {EPOLLIN|EPOLLRDHUP, {u32=54, u64=54}}, {EPOLLIN|EPOLLRDHUP, {u32=53, 
> u64=53}}, {EPOLLIN|EPOLLRDHUP, {u32=52, u64=52}}, {EPOLLIN|EPOLLRDHUP, 
> {u32=51, u64=51}}, {EPOLLIN|EPOLLRDHUP, {u32=50, u64=50}}, 
> {EPOLLIN|EPOLLRDHUP, {u32=49, u64=49}}, {EPOLLIN|EPOLLRDHUP, {u32=48, 
> u64=48}}, {EPOLLIN|EPOLLRDHUP, {u32=47, u64=47}}, {EPOLLIN|EPOLLRDHUP, 
> {u32=45, u64=45}}, {EPOLLIN|EPOLLRDHUP, {u32=42, u64=42}}, 
> {EPOLLIN|EPOLLRDHUP, {u32=44, u64=44}}, {EPOLLIN|EPOLLRDHUP, {u32=43, 
> u64=43}}}, 200, 1000) = 15
> epoll_wait(3, {{EPOLLIN|EPOLLRDHUP, {u32=57, u64=57}}, {EPOLLIN|EPOLLRDHUP, 
> {u32=56, u64=56}}, {EPOLLIN|EPOLLRDHUP, {u32=55, u64=55}}, 
> {EPOLLIN|EPOLLRDHUP, {u32=54, u64=54}}, {EPOLLIN|EPOLLRDHUP, {u32=53, 
> u64=53}}, {EPOLLIN|EPOLLRDHUP, {u32=52, u64=52}}, {EPOLLIN|EPOLLRDHUP, 
> {u32=51, u64=51}}, {EPOLLIN|EPOLLRDHUP, {u32=50, u64=50}}, 
> {EPOLLIN|EPOLLRDHUP, {u32=49, u64=49}}, {EPOLLIN|EPOLLRDHUP, {u32=48, 
> u64=48}}, {EPOLLIN|EPOLLRDHUP, {u32=47, u64=47}}, {EPOLLIN|EPOLLRDHUP, 
> {u32=45, u64=45}}, {EPOLLIN|EPOLLRDHUP, {u32=42, u64=42}}, 
> {EPOLLIN|EPOLLRDHUP, {u32=44, u64=44}}, {EPOLLIN|EPOLLRDHUP, {u32=43, 
> u64=43}}}, 200, 1000) = 15
> 
> epoll_wait awake in very fast loop.
> 
> When this point is reached, some of time, re-starting the injection will 
> crash haproxy in segfault.
> 
> Here my haproxy's config:
> global
>        tune.ssl.default-dh-param 2048
>       ssl-engine qat
>       ssl-async
> 
> listen gg
>        mode http
Hmm, I have been testing with tcp mode (which is what we use in prod env). Any 
chance you have a test with tcp mode and see if you could reproduce the same 
issue?

Thanks,

Grant

Reply via email to