On 31/10/21(Sun) 15:57, Jeremie Courreges-Anglas wrote:
> On Fri, Oct 08 2021, Jeremie Courreges-Anglas <j...@wxcvbn.org> wrote:
> > riscv64.ports was running dpb(1) with two other members in the build
> > cluster.  A few minutes ago I found it in ddb(4).  The report is short,
> > sadly, as the machine doesn't return from the 'bt' command.
> >
> > The machine is acting both as an NFS server and and NFS client.
> >
> > OpenBSD/riscv64 (riscv64.ports.openbsd.org) (console)
> >
> > login: panic: pool_anic:t: pol_ free l: p mod fiee liat m  oxifief:c a2e 
> > 07ff0ff fte21ade0 00f ifem c0d
> > 1 07f1f0ffcf2177 010=0 c16ce6 7x090xc52c !
> > 0x9066d21 919 xc1521
> > Stopped at      panic+0xfe:     addi    a0,zero,256    TID    PID    UID    
> >  PR
> > FLAGS     PFLAGS  CPU  COMMAND
> >   24243  43192     55         0x2          0    0  cc
> > *480349  52543      0        0x11          0    1  perl
> >  480803  72746     55         0x2          0    3  c++
> >  366351   3003     55         0x2          0    2K c++
> > panic() at panic+0xfa
> > panic() at pool_do_get+0x29a
> > pool_do_get() at pool_get+0x76
> > pool_get() at pmap_enter+0x128
> > pmap_enter() at uvm_fault_upper+0x1c2
> > uvm_fault_upper() at uvm_fault+0xb2
> > uvm_fault() at do_trap_user+0x120
> > https://www.openbsd.org/ddb.html describes the minimum info required in bug
> > reports.  Insufficient info makes it difficult to find and fix bugs.
> > ddb{1}> bt
> > panic() at panic+0xfa
> > panic() at pool_do_get+0x29a
> > pool_do_get() at pool_get+0x76
> > pool_get() at pmap_enter+0x128
> > pmap_enter() at uvm_fault_upper+0x1c2
> > uvm_fault_upper() at uvm_fault+0xb2
> > uvm_fault() at do_trap_user+0x120
> > do_trap_user() at cpu_exception_handler_user+0x7a
> > <hangs>
> 
> Another panic on riscv64-1, a new board which doesn't have RTC/I2C
> problems anymore and is acting as a dpb(1) cluster member/NFS client.

Why are both traces ending in pool_do_get()?  Are CPU0 and CPU1 there at
the same time?

This corruption as well as the one above arise in the top part of the
fault handler which already runs concurrently.  Did you try putting
KERNEL_LOCK/UNLOCK() dances around uvm_fault() in trap.c?  That could
help figure out if something is still unsafe in riscv64's pmap.

> <conserver logs>
> panic: pool_do_get: rwobjpl fane c: sool_difget:  ragbjpx 
> ffef li22 6od0fi; d: pm gd 0 xfffffff222baa0e ^M        8addo 
> ffffff0x020b6adf8f; 3cf0e 94ic  =p0 
> l d4_ef 85 0xof4cl fStopped at      panic+0xfe:     addi    a0,zero,256    
> TID    PID    UID     PR
> FLAGS     PFLAGS  CPU  COMMAND
> * 94448  18837     55    0x100002          0    1  bzip2
>  139717  98504     55         0x2          0    0  perl
>  451857  10216     55         0x2          0    3  c++
>  215599  53280     55         0x2          0    2  c++
> panic() at panic+0xfa
> panic() at pool_do_get+0x29a
> pool_do_get() at pool_get+0x76
> pool_get() at _rw_obj_alloc_flags+0x1e
> _rw_obj_alloc_flags() at amap_alloc+0x3a
> amap_alloc() at amap_copy+0x2b6
> amap_copy() at uvm_fault_check+0x1ec
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{1}> [-- jca@localhost attached -- Sun Oct 31 08:36:49 2021]
> 
> <interactive console prompt>
> ddb{1}> show panic
>  cpu0: pool_do_get: rwobjpl free list modified: page 0xffffffc22b6ad000; item 
> a
> ddr 0xffffffc22b6ade88; offset 0x0=0xcf4fef853c0094c7 != 0xcf4fef853cfc94c7
>  cpu3: pool_do_get: rwobjpl free list modified: page 0xffffffc22b6ad000; item 
> a
> ddr 0xffffffc22b6ade88; offset 0x0=0xcf4fef853c0094c7 != 0xcf4fef853cfc94c7
> *cpu1: pool_do_get: rwobjpl free list modified: page 0xffffffc22b6ad000; item 
> a
> ddr 0xffffffc22b6ade88; offset 0x0=0xcf4fef853c0094c7 != 0xcf4fef853cfc94c7
> ddb{1}> trace
> panic() at panic+0xfa
> panic() at pool_do_get+0x29a
> pool_do_get() at pool_get+0x76
> pool_get() at _rw_obj_alloc_flags+0x1e
> _rw_obj_alloc_flags() at amap_alloc+0x3a
> amap_alloc() at amap_copy+0x2b6
> amap_copy() at uvm_fault_check+0x1ec
> uvm_fault_check() at uvm_fault+0xd0
> uvm_fault() at do_trap_user+0x120
> do_trap_user() at cpu_exception_handler_user+0x7a
> address 0xfffffffffffffffe is invalid
> ddb{1}> mach ddbcpu 0
> Stopped at      ipi_intr+0x22:  c.li    a0,1
> ipi_intr() at ipi_intr+0x1e
> ipi_intr() at riscv_cpu_intr+0x1e
> riscv_cpu_intr() at cpu_exception_handler_supervisor+0x78
> cpu_exception_handler_supervisor() at cnputc+0x2a
> cnputc() at db_putchar+0x322
> db_putchar() at kprintf+0xc36
> kprintf() at db_printf+0x4a
> ddb{0}> trace
> ipi_intr() at ipi_intr+0x1e
> ipi_intr() at riscv_cpu_intr+0x1e
> riscv_cpu_intr() at cpu_exception_handler_supervisor+0x78
> cpu_exception_handler_supervisor() at cnputc+0x2a
> cnputc() at db_putchar+0x322
> db_putchar() at kprintf+0xc36
> kprintf() at db_printf+0x4a
> db_printf() at panic+0x8a
> panic() at pool_do_get+0x29a
> pool_do_get() at pool_get+0x76
> pool_get() at _rw_obj_alloc_flags+0x1e
> _rw_obj_alloc_flags() at amap_alloc+0x3a
> amap_alloc() at amap_copy+0x2b6
> amap_copy() at uvm_fault_check+0x1ec
> uvm_fault_check() at uvm_fault+0xd0
> uvm_fault() at do_trap_user+0x120
> do_trap_user() at cpu_exception_handler_user+0x7a
> 
> -- 
> jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE
> 

Reply via email to