On 5/7/19 3:35 PM, Marcin Deranek wrote: > Hi Emeric, > > On 5/7/19 1:53 PM, Emeric Brun wrote: >> On 5/7/19 1:24 PM, Marcin Deranek wrote: >>> Hi Emeric, >>> >>> On 5/7/19 11:44 AM, Emeric Brun wrote: >>>> Hi Marcin,>>>>>> As I use HAProxy 1.8 I had to adjust the patch (see >>>> attachment for end result). Unfortunately after applying the patch there >>>> is no change in behavior: we still leak /dev/usdm_drv descriptors and have >>>> "stuck" HAProxy instances after reload.. >>>>>>> Regards, >>>>>> >>>>>> >>>> >>>> Could you perform a test recompiling the usdm_drv and the engine with this >>>> patch, it applies on QAT 1.7 but I've no hardware to test this version >>>> here. >>>> >>>> It should fix the fd leak. >>> >>> It did fix fd leak: >>> >>> # ls -al /proc/2565/fd|fgrep dev >>> lr-x------ 1 root root 64 May 7 13:15 0 -> /dev/null >>> lrwx------ 1 root root 64 May 7 13:15 7 -> /dev/usdm_drv >>> >>> # systemctl reload haproxy.service >>> # ls -al /proc/2565/fd|fgrep dev >>> lr-x------ 1 root root 64 May 7 13:15 0 -> /dev/null >>> lrwx------ 1 root root 64 May 7 13:15 8 -> /dev/usdm_drv >>> >>> # systemctl reload haproxy.service >>> # ls -al /proc/2565/fd|fgrep dev >>> lr-x------ 1 root root 64 May 7 13:15 0 -> /dev/null >>> lrwx------ 1 root root 64 May 7 13:15 9 -> /dev/usdm_drv >>> >>> But there are still stuck processes :-( This is with both patches included: >>> for QAT and HAProxy. >>> Regards, >>> >>> Marcin Deranek >> >> Thank you Marcin! Anyway it's was also a bug. >> >> Could you process a 'show fds' command on a stucked process adding the patch >> in attachement. > > I did apply this patch and all previous patches (QAT + HAProxy > ssl_free_engine). This is what I got after 1st reload: > > show proc > #<PID> <type> <relative PID> <reloads> <uptime> > 8025 master 0 1 0d 00h03m25s > # workers > 31269 worker 1 0 0d 00h00m39s > 31270 worker 2 0 0d 00h00m39s > 31271 worker 3 0 0d 00h00m39s > 31272 worker 4 0 0d 00h00m39s > # old workers > 9286 worker [was: 1] 1 0d 00h03m25s > 9287 worker [was: 2] 1 0d 00h03m25s > 9288 worker [was: 3] 1 0d 00h03m25s > 9289 worker [was: 4] 1 0d 00h03m25s > > @!9286 show fd > 13 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [lc] cache=0 owner=0x23eaae0 > iocb=0x4877c0(mworker_accept_wrapper) tmask=0x1 umask=0x0 > 16 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x4e1ab0 > iocb=0x4e1ab0(thread_sync_io_handler) tmask=0xffffffffffffffff umask=0x0 > 20 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1601b840 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 21 : st=0x22(R:pRa W:pRa) ev=0x00(heopi) [lc] cache=0 owner=0x1f0ec4f0 > iocb=0x4ce6e0(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x00241300 fe=GLOBAL > mux=PASS mux_ctx=0x22ad8630 > 1412 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1bab1f30 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1413 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x247e5bc0 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1414 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x18883650 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1415 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x14476c10 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1416 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x11a27850 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1418 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x12008230 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1419 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1bb0a570 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1420 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x11c94790 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1421 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1449e050 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1422 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1f00c150 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1423 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x15f40550 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1424 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x124b6340 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1425 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x11fe4500 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1426 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x11c70a60 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1427 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x12572540 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1428 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1249a420 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1430 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x11b224a0 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1431 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x14f668e0 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1432 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1448a630 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1433 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x14f32010 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1434 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1588ed80 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1435 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1efb3e50 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1436 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x10f4cc40 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1437 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1bac59b0 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1439 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x144b1a70 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1440 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1170a380 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1441 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1bad93f0 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1442 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1bb27ca0 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1443 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x158233b0 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1444 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x124ba940 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1445 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x15f65850 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1446 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1ab4c9e0 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1447 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x11e2a7b0 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1448 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x16923e40 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1449 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x15e156c0 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1450 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1585f040 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1451 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x11d0c0f0 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1452 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1bb00860 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1454 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1bb1df90 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1455 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x11b16850 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1460 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x115ffe30 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1461 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x16936f10 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1462 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x15fbf350 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1463 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1efd1630 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1465 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1bacf6d0 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1467 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x11079580 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1468 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x11e425d0 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1469 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x144a7d60 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1472 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x23e6c10 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1474 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x158beac0 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1476 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1270e190 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1480 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x11f10960 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1484 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x124a4b40 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1488 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x11b461d0 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1490 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x11643280 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1492 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x215945c0 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1499 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x14f68b30 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1500 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x19e59970 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1503 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1fc7b710 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > 1508 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x16e7cc90 > iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0 > > Regards, > > Marcin Deranek
Thank you Marcin, It shows that haproxy is waiting for an event on all those fds because a crypto jobs were launched on the engine and we can't free the session until the end of this job (it would result in a segfault). So the processes are stucked, unable to free the session because the engine doesn't signal the end of those job via the async fd. I didn't reproduce this issue on QAT 1.5 so I will try to discuss it with intel guys to known why there is this behavior change in the v1.7 and what we can do. R, Emeric

