On 31/10/21(Sun) 15:57, Jeremie Courreges-Anglas wrote: > On Fri, Oct 08 2021, Jeremie Courreges-Anglas <j...@wxcvbn.org> wrote: > > riscv64.ports was running dpb(1) with two other members in the build > > cluster. A few minutes ago I found it in ddb(4). The report is short, > > sadly, as the machine doesn't return from the 'bt' command. > > > > The machine is acting both as an NFS server and and NFS client. > > > > OpenBSD/riscv64 (riscv64.ports.openbsd.org) (console) > > > > login: panic: pool_anic:t: pol_ free l: p mod fiee liat m oxifief:c a2e > > 07ff0ff fte21ade0 00f ifem c0d > > 1 07f1f0ffcf2177 010=0 c16ce6 7x090xc52c ! > > 0x9066d21 919 xc1521 > > Stopped at panic+0xfe: addi a0,zero,256 TID PID UID > > PR > > FLAGS PFLAGS CPU COMMAND > > 24243 43192 55 0x2 0 0 cc > > *480349 52543 0 0x11 0 1 perl > > 480803 72746 55 0x2 0 3 c++ > > 366351 3003 55 0x2 0 2K c++ > > panic() at panic+0xfa > > panic() at pool_do_get+0x29a > > pool_do_get() at pool_get+0x76 > > pool_get() at pmap_enter+0x128 > > pmap_enter() at uvm_fault_upper+0x1c2 > > uvm_fault_upper() at uvm_fault+0xb2 > > uvm_fault() at do_trap_user+0x120 > > https://www.openbsd.org/ddb.html describes the minimum info required in bug > > reports. Insufficient info makes it difficult to find and fix bugs. > > ddb{1}> bt > > panic() at panic+0xfa > > panic() at pool_do_get+0x29a > > pool_do_get() at pool_get+0x76 > > pool_get() at pmap_enter+0x128 > > pmap_enter() at uvm_fault_upper+0x1c2 > > uvm_fault_upper() at uvm_fault+0xb2 > > uvm_fault() at do_trap_user+0x120 > > do_trap_user() at cpu_exception_handler_user+0x7a > > <hangs> > > Another panic on riscv64-1, a new board which doesn't have RTC/I2C > problems anymore and is acting as a dpb(1) cluster member/NFS client.
Why are both traces ending in pool_do_get()? Are CPU0 and CPU1 there at the same time? This corruption as well as the one above arise in the top part of the fault handler which already runs concurrently. Did you try putting KERNEL_LOCK/UNLOCK() dances around uvm_fault() in trap.c? That could help figure out if something is still unsafe in riscv64's pmap. > <conserver logs> > panic: pool_do_get: rwobjpl fane c: sool_difget: ragbjpx > ffef li22 6od0fi; d: pm gd 0 xfffffff222baa0e ^M 8addo > ffffff0x020b6adf8f; 3cf0e 94ic =p0 > l d4_ef 85 0xof4cl fStopped at panic+0xfe: addi a0,zero,256 > TID PID UID PR > FLAGS PFLAGS CPU COMMAND > * 94448 18837 55 0x100002 0 1 bzip2 > 139717 98504 55 0x2 0 0 perl > 451857 10216 55 0x2 0 3 c++ > 215599 53280 55 0x2 0 2 c++ > panic() at panic+0xfa > panic() at pool_do_get+0x29a > pool_do_get() at pool_get+0x76 > pool_get() at _rw_obj_alloc_flags+0x1e > _rw_obj_alloc_flags() at amap_alloc+0x3a > amap_alloc() at amap_copy+0x2b6 > amap_copy() at uvm_fault_check+0x1ec > https://www.openbsd.org/ddb.html describes the minimum info required in bug > reports. Insufficient info makes it difficult to find and fix bugs. > ddb{1}> [-- jca@localhost attached -- Sun Oct 31 08:36:49 2021] > > <interactive console prompt> > ddb{1}> show panic > cpu0: pool_do_get: rwobjpl free list modified: page 0xffffffc22b6ad000; item > a > ddr 0xffffffc22b6ade88; offset 0x0=0xcf4fef853c0094c7 != 0xcf4fef853cfc94c7 > cpu3: pool_do_get: rwobjpl free list modified: page 0xffffffc22b6ad000; item > a > ddr 0xffffffc22b6ade88; offset 0x0=0xcf4fef853c0094c7 != 0xcf4fef853cfc94c7 > *cpu1: pool_do_get: rwobjpl free list modified: page 0xffffffc22b6ad000; item > a > ddr 0xffffffc22b6ade88; offset 0x0=0xcf4fef853c0094c7 != 0xcf4fef853cfc94c7 > ddb{1}> trace > panic() at panic+0xfa > panic() at pool_do_get+0x29a > pool_do_get() at pool_get+0x76 > pool_get() at _rw_obj_alloc_flags+0x1e > _rw_obj_alloc_flags() at amap_alloc+0x3a > amap_alloc() at amap_copy+0x2b6 > amap_copy() at uvm_fault_check+0x1ec > uvm_fault_check() at uvm_fault+0xd0 > uvm_fault() at do_trap_user+0x120 > do_trap_user() at cpu_exception_handler_user+0x7a > address 0xfffffffffffffffe is invalid > ddb{1}> mach ddbcpu 0 > Stopped at ipi_intr+0x22: c.li a0,1 > ipi_intr() at ipi_intr+0x1e > ipi_intr() at riscv_cpu_intr+0x1e > riscv_cpu_intr() at cpu_exception_handler_supervisor+0x78 > cpu_exception_handler_supervisor() at cnputc+0x2a > cnputc() at db_putchar+0x322 > db_putchar() at kprintf+0xc36 > kprintf() at db_printf+0x4a > ddb{0}> trace > ipi_intr() at ipi_intr+0x1e > ipi_intr() at riscv_cpu_intr+0x1e > riscv_cpu_intr() at cpu_exception_handler_supervisor+0x78 > cpu_exception_handler_supervisor() at cnputc+0x2a > cnputc() at db_putchar+0x322 > db_putchar() at kprintf+0xc36 > kprintf() at db_printf+0x4a > db_printf() at panic+0x8a > panic() at pool_do_get+0x29a > pool_do_get() at pool_get+0x76 > pool_get() at _rw_obj_alloc_flags+0x1e > _rw_obj_alloc_flags() at amap_alloc+0x3a > amap_alloc() at amap_copy+0x2b6 > amap_copy() at uvm_fault_check+0x1ec > uvm_fault_check() at uvm_fault+0xd0 > uvm_fault() at do_trap_user+0x120 > do_trap_user() at cpu_exception_handler_user+0x7a > > -- > jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE >