Hi Emeric,
On 5/3/19 4:50 PM, Emeric Brun wrote:
I've a testing platform here but I don't use the usdm_drv but the
qat_contig_mem and I don't reproduce this issue (I'm using QAT 1.5, as the doc
says to use with my chip) .
I see. I use qat 1.7 and qat-engine 0.5.40.
Anyway, could you re-compile a haproxy's binary if I provide you a testing
patch?
Sure, that should not be a problem.
The idea is to perform a deinit in the master to force a close of those '/dev's
at each reload. Perhaps It won't fix our issue but this leak of fd should not
be.
Hope this will give us at least some more insight..
Regards,
Marcin Deranek
On 5/3/19 4:21 PM, Marcin Deranek wrote:
Hi Emeric,
It looks like on every reload master leaks /dev/usdm_drv device:
# systemctl restart haproxy.service
# ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
lr-x------ 1 root root 64 May 3 15:40 0 -> /dev/null
lrwx------ 1 root root 64 May 3 15:40 7 -> /dev/usdm_drv
# systemctl reload haproxy.service
# ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
lr-x------ 1 root root 64 May 3 15:40 0 -> /dev/null
lrwx------ 1 root root 64 May 3 15:40 7 -> /dev/usdm_drv
lrwx------ 1 root root 64 May 3 15:40 9 -> /dev/usdm_drv
# systemctl reload haproxy.service
# ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
lr-x------ 1 root root 64 May 3 15:40 0 -> /dev/null
lrwx------ 1 root root 64 May 3 15:40 10 -> /dev/usdm_drv
lrwx------ 1 root root 64 May 3 15:40 7 -> /dev/usdm_drv
lrwx------ 1 root root 64 May 3 15:40 9 -> /dev/usdm_drv
Obviously workers do inherit this from the master. Looking at workers I see the
following:
* 1st gen:
# ls -al /proc/36083/fd|awk '/dev/ {print $NF}'|sort
/dev/null
/dev/null
/dev/qat_adf_ctl
/dev/qat_adf_ctl
/dev/qat_adf_ctl
/dev/qat_dev_processes
/dev/uio19
/dev/uio3
/dev/uio35
/dev/usdm_drv
* 2nd gen:
# ls -al /proc/41637/fd|awk '/dev/ {print $NF}'|sort
/dev/null
/dev/null
/dev/qat_adf_ctl
/dev/qat_adf_ctl
/dev/qat_adf_ctl
/dev/qat_dev_processes
/dev/uio23
/dev/uio39
/dev/uio7
/dev/usdm_drv
/dev/usdm_drv
Looks like only /dev/usdm_drv is leaked.
Cheers,
Marcin Deranek
On 5/3/19 2:22 PM, Emeric Brun wrote:
Hi Marcin,
On 4/29/19 6:41 PM, Marcin Deranek wrote:
Hi Emeric,
On 4/29/19 3:42 PM, Emeric Brun wrote:
Hi Marcin,
I've also a contact at intel who told me to try this option on the qat engine:
--disable-qat_auto_engine_init_on_fork/--enable-qat_auto_engine_init_on_fork
Disable/Enable the engine from being initialized automatically
following a
fork operation. This is useful in a situation where you want to tightly
control how many instances are being used for processes. For instance
if an
application forks to start a process that does not utilize QAT currently
the default behaviour is for the engine to still automatically get
started
in the child using up an engine instance. After using this flag either
the
engine needs to be initialized manually using the engine message:
INIT_ENGINE or will automatically get initialized on the first QAT
crypto
operation. The initialization on fork is enabled by default.
I tried to build QAT Engine with disabled auto init, but that did not help. Now
I get the following during startup:
2019-04-29T15:13:47.142297+02:00 host1 hapee-lb[16604]: qaeOpenFd:753 Unable to
initialize memory file handle /dev/usdm_drv
2019-04-29T15:13:47+02:00 localhost hapee-lb[16611]: 127.0.0.1:60512
[29/Apr/2019:15:13:47.139] vip1/23: SSL handshake failure
" INIT_ENGINE or will automatically get initialized on the first QAT crypto
operation"
Perhaps the init appears "with first qat crypto operation" and is delayed after
the fork so if a chroot is configured, it doesn't allow some accesses
to /dev. Could you perform a test in that case without chroot enabled in the
haproxy config ?
Removed chroot and now it initializes properly. Unfortunately reload still causes
"stuck" HAProxy process :-(
Marcin Deranek
Could you check with "ls -l /proc/<masterpid>/fd" if the "/dev/<qatengine>" is
open multiple times after a reload?
Emeric