Re: [External] Re: QAT intermittent healthcheck errors

Emeric Brun Mon, 29 Apr 2019 06:00:01 -0700

Hi Marcin,

On 4/19/19 3:26 PM, Marcin Deranek wrote:
> Hi Emeric,
> 
> On 4/18/19 4:35 PM, Emeric Brun wrote:
>>> An other interesting trace would be to perform a "show sess" command on a 
>>> stucked process through the master cli.
>>
>> And also the "show fd"
> 
> Here it is:
> 
> show proc
> #<PID>          <type>          <relative PID>  <reloads>       <uptime>
> 13409           master          0               1               0d 00h03m30s
> # workers
> 15084           worker          1               0               0d 00h03m20s
> 15085           worker          2               0               0d 00h03m20s
> 15086           worker          3               0               0d 00h03m20s
> 15087           worker          4               0               0d 00h03m20s
> # old workers
> 13415           worker          [was: 1]        1               0d 00h03m30s
> 13416           worker          [was: 2]        1               0d 00h03m30s
> 13417           worker          [was: 3]        1               0d 00h03m30s
> 13418           worker          [was: 4]        1               0d 00h03m30s
> 
> @!13415 show sess
> 0x4eee9c0: proto=sockpair ts=0a age=0s calls=1 
> rq[f=40c0c220h,i=0,an=00h,rx=,wx=,ax=] rp[f=80008000h,i=0,an=00h,rx=,wx=,ax=] 
> s0=[7,8h,fd=20,ex=] s1=[7,4018h,fd=-1,ex=] exp=
> 
> @!13415 show fd
>      13 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [lc] cache=0 owner=0x1a74ae0 
> iocb=0x487760(mworker_accept_wrapper) tmask=0x1 umask=0x0
>      16 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x4e19f0 
> iocb=0x4e19f0(thread_sync_io_handler) tmask=0xffffffffffffffff umask=0x0
>      20 : st=0x22(R:pRa W:pRa) ev=0x00(heopi) [lc] cache=0 owner=0x4fe1860 
> iocb=0x4ce620(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x00241300 fe=GLOBAL 
> mux=PASS mux_ctx=0x47dfd50
>      87 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x3ec1150 
> iocb=0x4f5d30(unknown) tmask=0x1 umask=0x0
>      88 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x3c237d0 
> iocb=0x4f5d30(unknown) tmask=0x1 umask=0x0
> 
> @!13416 show sess
> 0x48f2990: proto=sockpair ts=0a age=0s calls=1 
> rq[f=40c0c220h,i=0,an=00h,rx=,wx=,ax=] rp[f=80008000h,i=0,an=00h,rx=,wx=,ax=] 
> s0=[7,8h,fd=20,ex=] s1=[7,4018h,fd=-1,ex=] exp=
> 
> @!13416 show fd
>      15 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [lc] cache=0 owner=0x34c1540 
> iocb=0x487760(mworker_accept_wrapper) tmask=0x1 umask=0x0
>      16 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x4e19f0 
> iocb=0x4e19f0(thread_sync_io_handler) tmask=0xffffffffffffffff umask=0x0
>      20 : st=0x22(R:pRa W:pRa) ev=0x00(heopi) [lc] cache=0 owner=0x4b3cff0 
> iocb=0x4ce620(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x00241300 fe=GLOBAL 
> mux=PASS mux_ctx=0x4f0e510
>      75 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x3a6b2f0 
> iocb=0x4f5d30(unknown) tmask=0x1 umask=0x0
>      76 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x43a34e0 
> iocb=0x4f5d30(unknown) tmask=0x1 umask=0x0
> 
> Marcin Deranek


87,88,75,76 appears to be async engine FDs and should be cleaned. I will dig 
for that.

I've also a contact at intel who told me to try this option on the qat engine:

> --disable-qat_auto_engine_init_on_fork/--enable-qat_auto_engine_init_on_fork
>     Disable/Enable the engine from being initialized automatically following a
>     fork operation. This is useful in a situation where you want to tightly
>     control how many instances are being used for processes. For instance if 
> an
>     application forks to start a process that does not utilize QAT currently
>     the default behaviour is for the engine to still automatically get started
>     in the child using up an engine instance. After using this flag either the
>     engine needs to be initialized manually using the engine message:
>     INIT_ENGINE or will automatically get initialized on the first QAT crypto
>     operation. The initialization on fork is enabled by default.


R,
Emeric

Re: [External] Re: QAT intermittent healthcheck errors

Reply via email to