Hi Marcin, On 3/6/19 3:23 PM, Marcin Deranek wrote: > Hi, > > In a process of evaluating performance of Intel Quick Assist Technology in > conjunction with HAProxy software I acquired Intel C62x Chipset card for > testing. I configured QAT engine in the following manner: > > * /etc/qat/c6xx_dev[012].conf > > [GENERAL] > ServicesEnabled = cy > ConfigVersion = 2 > CyNumConcurrentSymRequests = 512 > CyNumConcurrentAsymRequests = 64 > statsGeneral = 1 > statsDh = 1 > statsDrbg = 1 > statsDsa = 1 > statsEcc = 1 > statsKeyGen = 1 > statsDc = 1 > statsLn = 1 > statsPrime = 1 > statsRsa = 1 > statsSym = 1 > KptEnabled = 0 > StorageEnabled = 0 > PkeServiceDisabled = 0 > DcIntermediateBufferSizeInKB = 64 > > [KERNEL] > NumberCyInstances = 0 > NumberDcInstances = 0 > > [SHIM] > NumberCyInstances = 1 > NumberDcInstances = 0 > NumProcesses = 16 > LimitDevAccess = 0 > > Cy0Name = "UserCY0" > Cy0IsPolled = 1 > Cy0CoreAffinity = 0 > > OpenSSL produces good results without warnings / errors: > > * No QAT involved > > $ openssl speed -elapsed rsa2048 > You have chosen to measure elapsed time instead of user CPU time. > Doing 2048 bits private rsa's for 10s: 10858 2048 bits private RSA's in 10.00s > Doing 2048 bits public rsa's for 10s: 361207 2048 bits public RSA's in 10.00s > OpenSSL 1.1.1a FIPS 20 Nov 2018 > built on: Tue Jan 22 20:43:41 2019 UTC > options:bn(64,64) md2(char) rc4(16x,int) des(int) aes(partial) idea(int) > blowfish(ptr) > compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -O2 -g -pipe > -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong > --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic > -Wa,--noexecstack -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC > -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT > -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM > -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM > -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM > -DPOLY1305_ASM -DZLIB -DNDEBUG -DPURIFY -DDEVRANDOM="\"/dev/urandom\"" > -DSYSTEM_CIPHERS_FILE="/opt/openssl/etc/crypto-policies/back-ends/openssl.config" > sign verify sign/s verify/s > rsa 2048 bits 0.000921s 0.000028s 1085.8 36120.7 > > * QAT enabled > > $ openssl speed -elapsed -engine qat -async_jobs 32 rsa2048 > engine "qat" set. > You have chosen to measure elapsed time instead of user CPU time. > Doing 2048 bits private rsa's for 10s: 205425 2048 bits private RSA's in > 10.00s > Doing 2048 bits public rsa's for 10s: 2150270 2048 bits public RSA's in 10.00s > OpenSSL 1.1.1a FIPS 20 Nov 2018 > built on: Tue Jan 22 20:43:41 2019 UTC > options:bn(64,64) md2(char) rc4(16x,int) des(int) aes(partial) idea(int) > blowfish(ptr) > compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -O2 -g -pipe > -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong > --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic > -Wa,--noexecstack -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC > -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT > -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM > -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM > -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM > -DPOLY1305_ASM -DZLIB -DNDEBUG -DPURIFY -DDEVRANDOM="\"/dev/urandom\"" > -DSYSTEM_CIPHERS_FILE="/opt/openssl/etc/crypto-policies/back-ends/openssl.config" > sign verify sign/s verify/s > rsa 2048 bits 0.000049s 0.000005s 20542.5 215027.0 > > So far so good. Unfortunately HAProxy 1.8 iwth QAT engine enabled > periodically fail with SSL checks of backend servers. The simplest > configuration I could get to reproduce it: > > * /etc/haproxy/haproxy.cfg > > global > user lbengine > group lbengine > daemon > ssl-mode-async > ssl-engine qat > ssl-server-verify none > stats socket /run/lb_engine/process-1.sock user lbengine group > lbengine mode 660 level admin expose-fd listeners process 1 > > defaults > mode http > timeout check 5s > timeout connect 4s > > backend pool_all > default-server inter 5s > > server server1 ip1:443 check ssl > server server2 ip2:443 check ssl > ... > server serverN ipN:443 check ssl > > Without QAT enabled everything works just fine - healthchecks do not flap. > With QAT engine enabled random server healtchecks flap: they fail and then > shortly after they recover eg. > > 2019-03-06T15:06:22+01:00 localhost hapee-lb[1832]: Server pool_all/server1 > is DOWN, reason: Layer6 timeout, check duration: 4000ms. 110 active and 0 > backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. > 2019-03-06T15:06:32+01:00 localhost hapee-lb[1832]: Server pool_all/server1 > is UP, reason: Layer6 check passed, check duration: 13ms. 117 active and 0 > backup servers online. 0 sessions requeued, 0 total in queue. > > Increasing check frequency (lowering check interval) makes the problem occur > more frequently. Anybody has a clue why this is happening ? Has anybody seen > such behavior ? > Regards, > > Marcin Deranek >
According to the documentation: ssl-mode-async Adds SSL_MODE_ASYNC mode to the SSL context. This enables asynchronous TLS I/O operations if asynchronous capable SSL engines are used. The current implementation supports a maximum of 32 engines. The Openssl ASYNC API doesn't support moving read/write buffers and is not compliant with haproxy's buffer management. So the asynchronous mode is disabled on read/write operations (it is only enabled during initial and reneg handshakes). Asynchronous mode is disabled on the read/write operation and is only enabled during handshake. It means that for the ciphering process the engine will be used in blocking mode (not async) which could result to unpredictable behavior on timers because the haproxy process will sporadically fully blocked waiting for the engine. To avoid this issue, you should ensure to use QAT only for the asymmetric computing algorithm (such as RSA DSA ECDSA). and not for ciphering ones (AES and everything else ...) The ssl engine statement allow you to filter such algos: ssl-engine <name> [algo <comma-separated list of algorithms>] R, Emeric