Re: [External] Re: QAT intermittent healthcheck errors

Marcin Deranek Fri, 03 May 2019 07:57:12 -0700

Hi Emeric,

On 5/3/19 4:50 PM, Emeric Brun wrote:

I've a testing platform here but I don't use the usdm_drv but the 
qat_contig_mem and I don't reproduce this issue (I'm using QAT 1.5, as the doc 
says to use with my chip) .


I see. I use qat 1.7 and qat-engine 0.5.40.

Anyway, could you re-compile a haproxy's binary if I provide you a testing 
patch?


Sure, that should not be a problem.

The idea is to perform a deinit in the master to force a close of those '/dev's 
at each reload. Perhaps It won't fix our issue but this leak of fd should not 
be.


Hope this will give us at least some more insight..
Regards,

Marcin Deranek

On 5/3/19 4:21 PM, Marcin Deranek wrote:

Hi Emeric,

It looks like on every reload master leaks /dev/usdm_drv device:

# systemctl restart haproxy.service
# ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
lr-x------ 1 root root 64 May  3 15:40 0 -> /dev/null
lrwx------ 1 root root 64 May  3 15:40 7 -> /dev/usdm_drv

# systemctl reload haproxy.service
# ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
lr-x------ 1 root root 64 May  3 15:40 0 -> /dev/null
lrwx------ 1 root root 64 May  3 15:40 7 -> /dev/usdm_drv
lrwx------ 1 root root 64 May  3 15:40 9 -> /dev/usdm_drv

# systemctl reload haproxy.service
# ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
lr-x------ 1 root root 64 May  3 15:40 0 -> /dev/null
lrwx------ 1 root root 64 May  3 15:40 10 -> /dev/usdm_drv
lrwx------ 1 root root 64 May  3 15:40 7 -> /dev/usdm_drv
lrwx------ 1 root root 64 May  3 15:40 9 -> /dev/usdm_drv

Obviously workers do inherit this from the master. Looking at workers I see the 
following:

* 1st gen:

# ls -al /proc/36083/fd|awk '/dev/ {print $NF}'|sort
/dev/null
/dev/null
/dev/qat_adf_ctl
/dev/qat_adf_ctl
/dev/qat_adf_ctl
/dev/qat_dev_processes
/dev/uio19
/dev/uio3
/dev/uio35
/dev/usdm_drv

* 2nd gen:

# ls -al /proc/41637/fd|awk '/dev/ {print $NF}'|sort
/dev/null
/dev/null
/dev/qat_adf_ctl
/dev/qat_adf_ctl
/dev/qat_adf_ctl
/dev/qat_dev_processes
/dev/uio23
/dev/uio39
/dev/uio7
/dev/usdm_drv
/dev/usdm_drv

Looks like only /dev/usdm_drv is leaked.

Cheers,

Marcin Deranek

On 5/3/19 2:22 PM, Emeric Brun wrote:

Hi Marcin,

On 4/29/19 6:41 PM, Marcin Deranek wrote:

Hi Emeric,

On 4/29/19 3:42 PM, Emeric Brun wrote:

Hi Marcin,

I've also a contact at intel who told me to try this option on the qat engine:

--disable-qat_auto_engine_init_on_fork/--enable-qat_auto_engine_init_on_fork
        Disable/Enable the engine from being initialized automatically 
following a
        fork operation. This is useful in a situation where you want to tightly
        control how many instances are being used for processes. For instance 
if an
        application forks to start a process that does not utilize QAT currently
        the default behaviour is for the engine to still automatically get 
started
        in the child using up an engine instance. After using this flag either 
the
        engine needs to be initialized manually using the engine message:
        INIT_ENGINE or will automatically get initialized on the first QAT 
crypto
        operation. The initialization on fork is enabled by default.


I tried to build QAT Engine with disabled auto init, but that did not help. Now 
I get the following during startup:

2019-04-29T15:13:47.142297+02:00 host1 hapee-lb[16604]: qaeOpenFd:753 Unable to 
initialize memory file handle /dev/usdm_drv
2019-04-29T15:13:47+02:00 localhost hapee-lb[16611]: 127.0.0.1:60512 
[29/Apr/2019:15:13:47.139] vip1/23: SSL handshake failure


" INIT_ENGINE or will automatically get initialized on the first QAT crypto 
operation"

Perhaps the init appears "with first qat crypto operation" and is delayed after 
the fork so if a chroot is configured, it doesn't allow some accesses
to /dev. Could you perform a test in that case without chroot enabled in the 
haproxy config ?


Removed chroot and now it initializes properly. Unfortunately reload still causes 
"stuck" HAProxy process :-(

Marcin Deranek


Could you check with "ls -l /proc/<masterpid>/fd" if the "/dev/<qatengine>" is 
open multiple times after a reload?

Emeric

Re: [External] Re: QAT intermittent healthcheck errors

Reply via email to