We've been here before: The problem is not related to KERNEL_LOCK around uvm_fault.
Jeremie Courreges-Anglas <[email protected]> wrote: > On Mon, Nov 01 2021, Martin Pieuchot <[email protected]> wrote: > > On 31/10/21(Sun) 15:57, Jeremie Courreges-Anglas wrote: > >> On Fri, Oct 08 2021, Jeremie Courreges-Anglas <[email protected]> wrote: > >> > riscv64.ports was running dpb(1) with two other members in the build > >> > cluster. A few minutes ago I found it in ddb(4). The report is short, > >> > sadly, as the machine doesn't return from the 'bt' command. > >> > > >> > The machine is acting both as an NFS server and and NFS client. > >> > > >> > OpenBSD/riscv64 (riscv64.ports.openbsd.org) (console) > >> > > >> > login: panic: pool_anic:t: pol_ free l: p mod fiee liat m oxifief:c a2e > >> > 07ff0ff fte21ade0 00f ifem c0d > >> > 1 07f1f0ffcf2177 010=0 c16ce6 7x090xc52c ! > >> > 0x9066d21 919 xc1521 > >> > Stopped at panic+0xfe: addi a0,zero,256 TID PID UID > >> > PR > >> > FLAGS PFLAGS CPU COMMAND > >> > 24243 43192 55 0x2 0 0 cc > >> > *480349 52543 0 0x11 0 1 perl > >> > 480803 72746 55 0x2 0 3 c++ > >> > 366351 3003 55 0x2 0 2K c++ > >> > panic() at panic+0xfa > >> > panic() at pool_do_get+0x29a > >> > pool_do_get() at pool_get+0x76 > >> > pool_get() at pmap_enter+0x128 > >> > pmap_enter() at uvm_fault_upper+0x1c2 > >> > uvm_fault_upper() at uvm_fault+0xb2 > >> > uvm_fault() at do_trap_user+0x120 > >> > https://www.openbsd.org/ddb.html describes the minimum info required in > >> > bug > >> > reports. Insufficient info makes it difficult to find and fix bugs. > >> > ddb{1}> bt > >> > panic() at panic+0xfa > >> > panic() at pool_do_get+0x29a > >> > pool_do_get() at pool_get+0x76 > >> > pool_get() at pmap_enter+0x128 > >> > pmap_enter() at uvm_fault_upper+0x1c2 > >> > uvm_fault_upper() at uvm_fault+0xb2 > >> > uvm_fault() at do_trap_user+0x120 > >> > do_trap_user() at cpu_exception_handler_user+0x7a > >> > <hangs> > >> > >> Another panic on riscv64-1, a new board which doesn't have RTC/I2C > >> problems anymore and is acting as a dpb(1) cluster member/NFS client. > > > > Why are both traces ending in pool_do_get()? Are CPU0 and CPU1 there at > > the same time? > > > > This corruption as well as the one above arise in the top part of the > > fault handler which already runs concurrently. Did you try putting > > KERNEL_LOCK/UNLOCK() dances around uvm_fault() in trap.c? That could > > help figure out if something is still unsafe in riscv64's pmap. > > On my riscv64 I did add locking around the two uvm_fault() calls as > suggested, rebooted, then started building libcrypto and libssl and left > the place. Sadly the box is now unreachable (panic?) and will stay as > is for the next days. I'll get back to it on sunday. > > Since I haven't mentioned it in this thread, clang crashes with SIGSEGV > often when building ports. For the two first published bulk builds > I just restarted the failed ports. > > -- > jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE >
