On Fri, Jun 27, 2025 at 3:34 AM Martin Pieuchot <m...@grenadille.net> wrote: > > On 26/06/25(Thu) 11:02, K R wrote: > > On Wed, Jun 25, 2025 at 1:30 PM K R <daharmaster...@gmail.com> wrote: > > > > > > [...] > > > > Hi Alexander, > > > > > > > > The good news: I can consistently reproduce the hang problem. But the > > > > bad news is that even with a WITNESS kernel and kern.witness.watch=2 > > > > (or even 3) I don't see any message or kernel panic. > > Do you mind sharing your recipe to reproduce the hang?
Now a protection fault trap, below. This is a -current kernel compiled with WITNESS and MP_LOCKDEBUG. Please let me know if you need to disable these options or enable something else. Thanks, --Kor kernel: protection fault trap, code=0 Stopped at Xipi_invlrange_pcid+0x37: show panic the kernel did not panic ddb{11}> trace Xipi_invlrange_pcid() at Xipi_invlrange_pcid+0x37 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x77e7d698c600, count: -2 ddb{11}> show locks ddb{11}> show uvm Current UVM status: pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12 32128632 VM pages: 48715 active, 194971 inactive, 1 wired, 12073171 free (179 3670 zero) freemin=1070954, free-target=1427938, inactive-target=5649476, wired-max=1070 9544 faults=-61402478, traps=1000373307, intrs=109644593, ctxswitch=-710638854 fpu switch=0 softint=1129844981, syscalls=-541527240, kmapent=11 fault counts: noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0 relocks=313014(4555), upgrades=0(0) anget(retries)=19314639(0), amapcopy=11 42296398 neighbor anon/obj pg=487865/1083289, gets(lock/unlock)=672813/317589 cases: anon=19173423, anoncow=141216, obj=596517, prcopy=71721, przero=-813 86369 daemon and swap counts: woke=1576, revs=0, scans=0, obscans=0, anscans=0 busy=0, freed=0, reactivate=0, deactivate=0 pageouts=0, pending=0, nswget=0 nswapdev=1 swpages=16777883, swpginuse=0, swpgonly=0 paging=0 kernel pointers: objs(kern)=0xffffffff82b406d8 ddb{11}> show bcstats Current Buffer Cache status: numbufs 1328555 busymapped 1, delwri 4 kvaslots 5221 avail kva slots 5220 bufpages 19307018, dmapages 83470, dirtypages 52 pendingreads 1, pendingwrites 0 highflips 105286762, highflops 0, dmaflips 27731 ddb{11}> ps /o TID PID UID PRFLAGS PFLAGS CPU COMMAND 91867 43631 0 0x3 0x4000000 2 semaphore 521718 23585 0 0x3 0x4000000 12 semaphore 101939 23757 0 0x3 0x4000000 1 semaphore 207251 37453 0 0x3 0x4000000 10 semaphore 100237 19899 0 0x3 0x4000000 7 semaphore 397618 64074 0 0x3 0x4000000 8 semaphore 147922 39167 0 0x3 0x4000000 4 semaphore 41621 25216 0 0x3 0x4000000 14 semaphore *111074 74925 0 0x3 0 11 nfdump 63333 74925 0 0x3 0x4000000 0 nfdump 413988 10639 0 0x3 0 15 nfdump 194585 10639 0 0x3 0x4000000 3 nfdump 79353 71900 0 0x3 0 6 nfdump 171795 71900 0 0x3 0x4000000 13K nfdump > > > > > Any additional suggestion in order to increase debug information or > > > > force the machine to go into ddb? > > > > > > I tried a MP_LOCKDEBUG kernel and finally got a panic, shown below. > > > Sorry for not including 'show locks' and all the cpus, but my serial > > > > Another hang, now with show locks: > > > > ddb{2}> show locks > > exclusive rwlock amaplk r = 0 (0xfffffd9a244e4e78) > > exclusive rwlock vmmaplk r = 0 (0xfffffd9c93c4ca90) > > -- > > ddb{2}> ps /o > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > 322239 87810 0 0x3 0x4000000 0 semaphore > > 86705 31331 0 0x3 0x4000000 11 semaphore > > 383438 44550 0 0x3 0x4000000 14 semaphore > > 257640 77711 0 0x3 0x4000000 13 semaphore > > 450354 41159 0 0x3 0x4000000 7 semaphore > > 427275 25212 0 0x3 0x4000000 12 semaphore > > 227039 47648 0 0x3 0x4000000 8 semaphore > > 28812 15439 0 0x3 0x4000000 6 semaphore > > 37644 76378 0 0x100003 0 3K top > > *305600 7089 0 0x3 0 2 nfdump > > 192362 7089 0 0x3 0x4000000 9 nfdump > > 160550 56545 0 0x3 0x4000000 4 nfdump > > 367292 98858 0 0x3 0x4000000 1 nfdump > > -- > > ddb{2}> trace > > db_enter() at db_enter+0x14 > > pmap_do_remove(fffffd9c7aa25100,d6734d0000,d6734e0000,0) at > > pmap_do_remove+0x4a9 > > uvm_unmap_kill_entry_withlock(fffffd9c93c4c990,fffffd9c80c83258,1) at > > uvm_unmap_kill_entry_withlock+0x133 > > uvm_unmap_remove(fffffd9c93c4c990,d6734a0000,d6739a0000,ffff800055feaed0,0,1,7ff574ce4a271e31) > > at uvm_unmap_remove+0x32f > > sys_munmap(ffff8000fffe6070,ffff800055feafd0,ffff800055feaf50) at > > sys_munmap+0x10b > > syscall(ffff800055feafd0) at syscall+0x5f9 > > Xsyscall() at Xsyscall+0x128 > > end of kernel > > end trace frame: 0x7365c97d4f10, count: -7 > > -- > > ddb{2}> machine ddbcpu 0 > > Stopped at x86_ipi_db+0x16: leave > > > > ddb{0}> trace > > x86_ipi_db(ffffffff829e6ff0) at x86_ipi_db+0x16 > > x86_ipi_handler() at x86_ipi_handler+0x80 > > Xresume_lapic_ipi() at Xresume_lapic_ipi+0x27 > > __mp_lock(ffffffff82b2d348) at __mp_lock+0xbe > > softintr_dispatch(0) at softintr_dispatch+0xeb > > dosoftint(0) at dosoftint+0x3f > > Xsoftclock() at Xsoftclock+0x27 > > end of kernel > > end trace frame: 0x71789b60c60, count: -7 > > -- > > ddb{1}> machine ddbcpu 2 > > Stopped at db_enter+0x14: popq %rbp > > ddb{2}> trace > > db_enter() at db_enter+0x14 > > pmap_do_remove(fffffd9c7aa25100,d6734d0000,d6734e0000,0) at > > pmap_do_remove+0x4a9 > > uvm_unmap_kill_entry_withlock(fffffd9c93c4c990,fffffd9c80c83258,1) at > > uvm_unmap_kill_entry_withlock+0x133 > > uvm_unmap_remove(fffffd9c93c4c990,d6734a0000,d6739a0000,ffff800055feaed0,0,1,7ff574ce4a271e31) > > at uvm_unmap_remove+0x32f > > sys_munmap(ffff8000fffe6070,ffff800055feafd0,ffff800055feaf50) at > > sys_munmap+0x10b > > syscall(ffff800055feafd0) at syscall+0x5f9 > > Xsyscall() at Xsyscall+0x128 > > end of kernel > > end trace frame: 0x7365c97d4f10, count: -7 > > -- > > > > > console session froze before that. > > > > > > Hope it helps, > > > --Kor > > > > > > pmap_tlb_shootwait: spun out > > > Stopped at db_enter+0x14: popq %rbp > > > ddb{13}> > > > -- > > > ddb{13}> show panic > > > the kernel did not panic > > > -- > > > ddb{13}> ps /o > > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > > 307801 52176 0 0x2 0x4000000 10 semaphore > > > 410431 54690 0 0x2 0x4000000 15 semaphore > > > 257513 26918 0 0x2 0x4000000 11 semaphore > > > 138054 68532 0 0x2 0x4000000 5 semaphore > > > 203059 2408 0 0x2 0x4000000 9 semaphore > > > 89141 5998 0 0x2 0x4000000 8 semaphore > > > 263439 99498 0 0x2 0x4000000 7 semaphore > > > 477129 15941 0 0x2 0x4000000 12 semaphore > > > 41845 21800 0 0x3 0x4000000 2 nfdump > > > 371022 21800 0 0x3 0x4000000 0 nfdump > > > *391905 66338 0 0x3 0x4000000 13 nfdump > > > 367615 66338 0 0x3 0x4000000 6 nfdump > > > 394180 56727 0 0x3 0x4000000 3 nfdump > > > 46184 33300 0 0x100003 0 1K top > > > -- > > > ddb{13}> trace > > > db_enter() at db_enter+0x14 > > > pmap_page_remove(fffffd8184bdc408) at pmap_page_remove+0x38e > > > uvm_anfree_list(fffffd9c88e2be50,0) at uvm_anfree_list+0x63 > > > amap_wipeout(fffffd9ca390fc88) at amap_wipeout+0x198 > > > uvm_unmap_detach(ffff80005608fcc0,0) at uvm_unmap_detach+0xbe > > > sys_munmap(ffff800055cda540,ffff80005608fdc0,ffff80005608fd40) at > > > sys_munmap+0x185 > > > syscall(ffff80005608fdc0) at syscall+0x5f9 > > > Xsyscall() at Xsyscall+0x128 > > > end of kernel > > > end trace frame: 0x123cf0e2b80, count: -8 > > > -- > > > ddb{13}> machine ddbcpu 1 > > > Stopped at x86_ipi_db+0x16: leave > > > > > > ddb{1}> trace > > > x86_ipi_db(ffff8000552aaff0) at x86_ipi_db+0x16 > > > x86_ipi_handler() at x86_ipi_handler+0x80 > > > Xresume_lapic_ipi() at Xresume_lapic_ipi+0x27 > > > pmap_enter(fffffd9cd8988d00,7103880b6000,1c7767a000,3,33) at > > > pmap_enter+0x703 > > > uvm_fault_upper(ffff800055da4928,ffff800055da4960,ffff800055da4820) at > > > uvm_fault_upper+0x1e0 > > > uvm_fault(fffffd9ca1ab8b58,7103880b6000,2,3) at uvm_fault+0xce > > > uvm_fault_wire(fffffd9ca1ab8b58,7103880b6000,7103880b7000,3) at > > > uvm_fault_wire+0x58 > > > uvm_map_pageable_wire(fffffd9ca1ab8b58,fffffd9c8f522700,fffffd9ca0915ab8,0,3,0) > > > at uvm_map_pageable_wire+0x19d > > > sysctl_vslock(7103880b6d68,18) at sysctl_vslock+0x9b > > > sys_sysctl(ffff800055d61c60,ffff800055da4c80,ffff800055da4c00) at > > > sys_sysctl+0x187 > > > syscall(ffff800055da4c80) at syscall+0x5f9 > > > Xsyscall() at Xsyscall+0x128 > > > end of kernel > > > end trace frame: 0x7103880b6dc0, count: -12 > > > -- > > > > > > [...] > > > >