Hi Marcin,
Good so we progress!
I've a testing platform here but I don't use the usdm_drv but the
qat_contig_mem and I don't reproduce this issue (I'm using QAT 1.5, as the doc
says to use with my chip) .
Anyway, could you re-compile a haproxy's binary if I provide you a testing
patch?
The idea is to perform a deinit in the master to force a close of those '/dev's
at each reload. Perhaps It won't fix our issue but this leak of fd should not
be.
R,
Emeric
On 5/3/19 4:21 PM, Marcin Deranek wrote:
> Hi Emeric,
>
> It looks like on every reload master leaks /dev/usdm_drv device:
>
> # systemctl restart haproxy.service
> # ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
> lr-x------ 1 root root 64 May 3 15:40 0 -> /dev/null
> lrwx------ 1 root root 64 May 3 15:40 7 -> /dev/usdm_drv
>
> # systemctl reload haproxy.service
> # ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
> lr-x------ 1 root root 64 May 3 15:40 0 -> /dev/null
> lrwx------ 1 root root 64 May 3 15:40 7 -> /dev/usdm_drv
> lrwx------ 1 root root 64 May 3 15:40 9 -> /dev/usdm_drv
>
> # systemctl reload haproxy.service
> # ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
> lr-x------ 1 root root 64 May 3 15:40 0 -> /dev/null
> lrwx------ 1 root root 64 May 3 15:40 10 -> /dev/usdm_drv
> lrwx------ 1 root root 64 May 3 15:40 7 -> /dev/usdm_drv
> lrwx------ 1 root root 64 May 3 15:40 9 -> /dev/usdm_drv
>
> Obviously workers do inherit this from the master. Looking at workers I see
> the following:
>
> * 1st gen:
>
> # ls -al /proc/36083/fd|awk '/dev/ {print $NF}'|sort
> /dev/null
> /dev/null
> /dev/qat_adf_ctl
> /dev/qat_adf_ctl
> /dev/qat_adf_ctl
> /dev/qat_dev_processes
> /dev/uio19
> /dev/uio3
> /dev/uio35
> /dev/usdm_drv
>
> * 2nd gen:
>
> # ls -al /proc/41637/fd|awk '/dev/ {print $NF}'|sort
> /dev/null
> /dev/null
> /dev/qat_adf_ctl
> /dev/qat_adf_ctl
> /dev/qat_adf_ctl
> /dev/qat_dev_processes
> /dev/uio23
> /dev/uio39
> /dev/uio7
> /dev/usdm_drv
> /dev/usdm_drv
>
> Looks like only /dev/usdm_drv is leaked.
>
> Cheers,
>
> Marcin Deranek
>
> On 5/3/19 2:22 PM, Emeric Brun wrote:
>> Hi Marcin,
>>
>> On 4/29/19 6:41 PM, Marcin Deranek wrote:
>>> Hi Emeric,
>>>
>>> On 4/29/19 3:42 PM, Emeric Brun wrote:
>>>> Hi Marcin,
>>>>
>>>>>
>>>>>> I've also a contact at intel who told me to try this option on the qat
>>>>>> engine:
>>>>>>
>>>>>>> --disable-qat_auto_engine_init_on_fork/--enable-qat_auto_engine_init_on_fork
>>>>>>> Disable/Enable the engine from being initialized automatically
>>>>>>> following a
>>>>>>> fork operation. This is useful in a situation where you want to
>>>>>>> tightly
>>>>>>> control how many instances are being used for processes. For
>>>>>>> instance if an
>>>>>>> application forks to start a process that does not utilize QAT
>>>>>>> currently
>>>>>>> the default behaviour is for the engine to still automatically
>>>>>>> get started
>>>>>>> in the child using up an engine instance. After using this flag
>>>>>>> either the
>>>>>>> engine needs to be initialized manually using the engine message:
>>>>>>> INIT_ENGINE or will automatically get initialized on the first
>>>>>>> QAT crypto
>>>>>>> operation. The initialization on fork is enabled by default.
>>>>>
>>>>> I tried to build QAT Engine with disabled auto init, but that did not
>>>>> help. Now I get the following during startup:
>>>>>
>>>>> 2019-04-29T15:13:47.142297+02:00 host1 hapee-lb[16604]: qaeOpenFd:753
>>>>> Unable to initialize memory file handle /dev/usdm_drv
>>>>> 2019-04-29T15:13:47+02:00 localhost hapee-lb[16611]: 127.0.0.1:60512
>>>>> [29/Apr/2019:15:13:47.139] vip1/23: SSL handshake failure
>>>>
>>>> " INIT_ENGINE or will automatically get initialized on the first QAT
>>>> crypto operation"
>>>>
>>>> Perhaps the init appears "with first qat crypto operation" and is delayed
>>>> after the fork so if a chroot is configured, it doesn't allow some accesses
>>>> to /dev. Could you perform a test in that case without chroot enabled in
>>>> the haproxy config ?
>>>
>>> Removed chroot and now it initializes properly. Unfortunately reload still
>>> causes "stuck" HAProxy process :-(
>>>
>>> Marcin Deranek
>>
>> Could you check with "ls -l /proc/<masterpid>/fd" if the "/dev/<qatengine>"
>> is open multiple times after a reload?
>>
>> Emeric
>>