Hi Marcin,

Do you have ssl enabled on the server side? If it is the case could replace 
health check with a simple tcp check (without ssl)?

Regarding the show info/lsoff  it seems there is no more sessions on client 
side but remaining ssl jobs (CurrSslConns) and I supsect the health checks to 
miss a cleanup of their ssl sessions using the QAT. (this is just an 
assumption) 

R,
Emeric

On 4/12/19 4:43 PM, Marcin Deranek wrote:
> Hi Emeric,
> 
> On 4/10/19 2:20 PM, Emeric Brun wrote:
> 
>> On 4/10/19 1:02 PM, Marcin Deranek wrote:
>>> Hi Emeric,
>>>
>>> Our process limit in QAT configuration is quite high (128) and I was able 
>>> to run 100+ openssl processes without a problem. According to Joel from 
>>> Intel problem is in cleanup code - presumably when HAProxy exits and frees 
>>> up QAT resources. Will try to see if I can get more debug information.
>>
>> I've just take a look.
>>
>> Engines deinit ar called:
>>
>> haproxy/src/ssl_sock.c
>> #ifndef OPENSSL_NO_ENGINE
>> void ssl_free_engines(void) {
>>          struct ssl_engine_list *wl, *wlb;
>>          /* free up engine list */
>>          list_for_each_entry_safe(wl, wlb, &openssl_engines, list) {
>>                  ENGINE_finish(wl->e);
>>                  ENGINE_free(wl->e);
>>                  LIST_DEL(&wl->list);
>>                  free(wl);
>>          }
>> }
>> #endif
>> ...
>> #ifndef OPENSSL_NO_ENGINE
>>          hap_register_post_deinit(ssl_free_engines);
>> #endif
>>
>> I don't know how many haproxy processes you are running but if I describe 
>> the complete scenario of processes you may note that we reach a limit:
> 
> It's very unlikely it's the limit as I lowered number of HAProxy processes 
> (from 10 to 4) while keeping QAT NumProcesses equal 32. HAProxy would have 
> problem with this limit while spawning new instances and not tearing down old 
> ones. In such a case QAT would not be initialized for some HAProxy instances 
> (you would see 1 thread vs 2 thread). About threads read below.
> 
>> - the master sends a signal to older processes, those process will unbind 
>> and stop to accept new conns but continue to serve remaining sessions until 
>> the end.
>> - new processes are started and immediately and init the engine and accept 
>> newconns.
>> - When no more sessions remains on an old process, it calls the deinit 
>> function of the engine before exiting
> 
> What I noticed is that each HAProxy with QAT enabled has 2 threads (LWP) - 
> looks like QAT adds extra thread to the process itself. Would adding extra 
> thread possibly mess up HAProxy termination sequence ?
> Our setup is to run HAProxy in multi process mode - no threads (or 1 thread 
> per process if you wish).
> 
>> I'm also supposed that old processes are stucked because there is some 
>> sessions which never ended, perhaps I'm wrong but a strace on an old process
>> could be interesting to know why those processes are stucked.
> 
> strace only shows these:
> 
> [pid 11392] 23:24:43.164619 epoll_wait(4,  <unfinished ...>
> [pid 11392] 23:24:43.164687 <... epoll_wait resumed> [], 200, 0) = 0
> [pid 11392] 23:24:43.164761 epoll_wait(4,  <unfinished ...>
> [pid 11392] 23:24:43.953203 <... epoll_wait resumed> [], 200, 788) = 0
> [pid 11392] 23:24:43.953286 epoll_wait(4,  <unfinished ...>
> [pid 11392] 23:24:43.953355 <... epoll_wait resumed> [], 200, 0) = 0
> [pid 11392] 23:24:43.953419 epoll_wait(4,  <unfinished ...>
> [pid 11392] 23:24:44.010508 <... epoll_wait resumed> [], 200, 57) = 0
> [pid 11392] 23:24:44.010589 epoll_wait(4,  <unfinished ...>
> 
> There are no connections: stucked process only has UDP socket on random port:
> 
> [root@externallb-124 ~]# lsof -p 6307|fgrep IPv4
> hapee-lb 6307 lbengine   83u     IPv4         3598779351      0t0 UDP *:19573
> 
> 
>> You can also use the 'master CLI' using '-S' and you could check if it 
>> remains sessions on those older processes (doc is available in 
>> management.txt)
> 
> Before reload
> * systemd
>  Main PID: 33515 (hapee-lb)
>    Memory: 1.6G
>    CGroup: /system.slice/hapee-1.8-lb.service
>            ├─33515 /opt/hapee-1.8/sbin/hapee-lb -Ws -f 
> /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
>            ├─34858 /opt/hapee-1.8/sbin/hapee-lb -Ws -f 
> /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
>            ├─34859 /opt/hapee-1.8/sbin/hapee-lb -Ws -f 
> /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
>            ├─34860 /opt/hapee-1.8/sbin/hapee-lb -Ws -f 
> /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
>            └─34861 /opt/hapee-1.8/sbin/hapee-lb -Ws -f 
> /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
> * master CLI
> show proc
> #<PID>          <type>          <relative PID>  <reloads>       <uptime>
> 33515           master          0               0               0d 00h00m31s
> # workers
> 34858           worker          1               0               0d 00h00m31s
> 34859           worker          2               0               0d 00h00m31s
> 34860           worker          3               0               0d 00h00m31s
> 34861           worker          4               0               0d 00h00m31s
> 
> After reload:
> * systemd
>  Main PID: 33515 (hapee-lb)
>    Memory: 3.1G
>    CGroup: /system.slice/hapee-1.8-lb.service
>            ├─33515 /opt/hapee-1.8/sbin/hapee-lb -Ws -f 
> /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 -sf 34858 
> 34859 34860 34861 -x /run/lb_engine/process-1.sock
>            ├─34858 /opt/hapee-1.8/sbin/hapee-lb -Ws -f 
> /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
>            ├─34859 /opt/hapee-1.8/sbin/hapee-lb -Ws -f 
> /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
>            ├─34860 /opt/hapee-1.8/sbin/hapee-lb -Ws -f 
> /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
>            ├─34861 /opt/hapee-1.8/sbin/hapee-lb -Ws -f 
> /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
>            ├─41871 /opt/hapee-1.8/sbin/hapee-lb -Ws -f 
> /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 -sf 34858 
> 34859 34860 34861 -x /run/lb_engine/process-1.sock
>            ├─41872 /opt/hapee-1.8/sbin/hapee-lb -Ws -f 
> /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 -sf 34858 
> 34859 34860 34861 -x /run/lb_engine/process-1.sock
>            ├─41873 /opt/hapee-1.8/sbin/hapee-lb -Ws -f 
> /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 -sf 34858 
> 34859 34860 34861 -x /run/lb_engine/process-1.sock
>            └─41874 /opt/hapee-1.8/sbin/hapee-lb -Ws -f 
> /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 -sf 34858 
> 34859 34860 34861 -x /run/lb_engine/process-1.sock
> * master CLI
> show proc
> #<PID>          <type>          <relative PID>  <reloads>       <uptime>
> 33515           master          0               1               0d 00h01m33s
> # workers
> 41871           worker          1               0               0d 00h00m45s
> 41872           worker          2               0               0d 00h00m45s
> 41873           worker          3               0               0d 00h00m45s
> 41874           worker          4               0               0d 00h00m45s
> # old workers
> 34858           worker          [was: 1]        1               0d 00h01m33s
> 34859           worker          [was: 2]        1               0d 00h01m33s
> 34860           worker          [was: 3]        1               0d 00h01m33s
> 34861           worker          [was: 4]        1               0d 00h01m33s
> 
> and
> 
> @!34858 show info
> Name: HAProxy
> Version: 1.8.0-2.0.0-195.793
> Release_date: 2019/03/19
> Nbthread: 1
> Nbproc: 4
> Process_num: 1
> Pid: 34858
> Uptime: 0d 0h03m24s
> Uptime_sec: 204
> Memmax_MB: 0
> PoolAlloc_MB: 1
> PoolUsed_MB: 1
> PoolFailed: 0
> Ulimit-n: 2006423
> CurrConns: 0
> CumConns: 354
> CumReq: 342
> CurrSslConns: 20
> CumSslConns: 35928
> Maxpipes: 0
> PipesUsed: 0
> PipesFree: 0
> ConnRate: 0
> ConnRateLimit: 0
> MaxConnRate: 65
> SessRate: 0
> SessRateLimit: 0
> MaxSessRate: 62
> SslRate: 0
> SslRateLimit: 0
> MaxSslRate: 52
> SslFrontendKeyRate: 0
> SslFrontendMaxKeyRate: 52
> SslFrontendSessionReuse_pct: 0
> SslBackendKeyRate: 0
> SslBackendMaxKeyRate: 2988
> SslCacheLookups: 0
> SslCacheMisses: 0
> CompressBpsIn: 0
> CompressBpsOut: 0
> CompressBpsRateLim: 0
> Tasks: 5849
> Run_queue: 1
> Idle_pct: 100
> Stopping: 1
> Jobs: 25
> Unstoppable Jobs: 4
> Listeners: 4
> DroppedLogs: 0
> 
> Regards,
> 
> Marcin Deranek


Reply via email to