On Fri, Oct 08 2021, Mark Kettenis <[email protected]> wrote:
>> From: Jeremie Courreges-Anglas <[email protected]>
>> Date: Fri, 08 Oct 2021 18:19:47 +0200
>>
>> riscv64.ports was running dpb(1) with two other members in the build
>> cluster. A few minutes ago I found it in ddb(4). The report is short,
>> sadly, as the machine doesn't return from the 'bt' command.
>>
>> The machine is acting both as an NFS server and and NFS client.
>>
>> OpenBSD/riscv64 (riscv64.ports.openbsd.org) (console)
>>
>> login: panic: pool_anic:t: pol_ free l: p mod fiee liat m oxifief:c a2e
>> 07ff0ff fte21ade0 00f ifem c0d
>> 1 07f1f0ffcf2177 010=0 c16ce6 7x090xc52c !
>> 0x9066d21 919 xc1521
>> Stopped at panic+0xfe: addi a0,zero,256 TID PID UID
>> PR
>> FLAGS PFLAGS CPU COMMAND
>> 24243 43192 55 0x2 0 0 cc
>> *480349 52543 0 0x11 0 1 perl
>> 480803 72746 55 0x2 0 3 c++
>> 366351 3003 55 0x2 0 2K c++
>> panic() at panic+0xfa
>> panic() at pool_do_get+0x29a
>> pool_do_get() at pool_get+0x76
>> pool_get() at pmap_enter+0x128
>> pmap_enter() at uvm_fault_upper+0x1c2
>> uvm_fault_upper() at uvm_fault+0xb2
>> uvm_fault() at do_trap_user+0x120
>> https://www.openbsd.org/ddb.html describes the minimum info required in bug
>> reports. Insufficient info makes it difficult to find and fix bugs.
>> ddb{1}> bt
>> panic() at panic+0xfa
>> panic() at pool_do_get+0x29a
>> pool_do_get() at pool_get+0x76
>> pool_get() at pmap_enter+0x128
>> pmap_enter() at uvm_fault_upper+0x1c2
>> uvm_fault_upper() at uvm_fault+0xb2
>> uvm_fault() at do_trap_user+0x120
>> do_trap_user() at cpu_exception_handler_user+0x7a
>> <hangs>
>>
>> The conserver logs for this console provide a hint about when it
>> happened:
>>
>> --8<--
>> [-- MARK -- Fri Oct 8 08:00:00 2021]
>> [-- MARK -- Fri Oct 8 09:00:00 2021]
>> [-- MARK -- Fri Oct 8 10:00:00 2021]
>> bt
>> ^Mpanic() at panic+0xfa
>> ^Mpanic() at pool_do_get+0x29a
>> ...
>> -->8--
>>
>> It seems that Theo was plugging/unplugging usb cables at that time.
>> I asked Theo to reboot the machine as I couldn't get more useful output.
>
> Thanks for the heads up. Some sort of memory corruption, but no real
> clues what caused it.
Another one, maybe a similar cause, maybe not. :-/
[...]
OpenBSD 7.0-current (GENERIC.MP) #84: Mon Oct 18 01:23:24 MDT 2021
[email protected]:/usr/src/sys/arch/riscv64/compile/GENERIC.MP
[...]
OpenBSD/riscv64 (riscv64.ports.openbsd.org) (console)
login: t[0] == 0x0000000000000000
t[1] == 0xffffffc00034feb2
t[2] == 0xffffffc227cd9630
t[3] == 0xffffffc0008cf1b0
t[4] == 0x0000000000000022
t[5] == 0x0000000000000000
t[6] == 0x000000007c9bd777
s[0] == 0xffffffc227cd9680
s[1] == 0xffffffc2229d8d28
s[2] == 0xffffffc000a5e9a8
s[3] == 0xffffffc2229ad6a0
s[4] == 0xffffffc0008ff1f8
s[5] == 0x0000000000000000
s[6] == 0xffffffc22a21a050
s[7] == 0x0000000000000000
s[8] == 0xffffffc000a20d30
s[9] == 0xffffffc000a25718
s[10] == 0xffffffc000a8f6a0
s[11] == 0xffffffc000a25714
a[0] == 0x95b8040228044314
a[1] == 0x95b8040228044314
a[2] == 0xffffffc2229d8d28
a[3] == 0x0000000000000001
a[4] == 0xffffffc023027800
a[5] == 0x0000000000000000
a[6] == 0x0000000000000003
a[7] == 0xffffffc0008cf1a0
sepc == 0xffffffc0002f043e
sstatus == 0x0000000200000120
stval == 0xffffff822804431c
scause == 0x000000000000000d
panic: Fatal page fault at 0xffffffc0002f043e: 0xffffff822804431c
Stopped at panic+0xfe: addi a0,zero,256 TID PID UID PR
FLAGS PFLAGS CPU COMMAND
469890 40238 55 0x100002 0 2 sh
*380568 80457 55 0x100002 0 3K touch
299235 73888 55 0x2 0 1 perl
15710 89701 56 0x100002 0 0 ftp
panic() at panic+0xfa
panic() at do_trap_supervisor+0x232
dump_regs() at cpu_exception_handler_supervisor+0x78
cpu_exception_handler_supervisor() at pool_put+0x30
pool_put() at ffs_reclaim+0x5c
ffs_reclaim() at VOP_RECLAIM+0x32
VOP_RECLAIM() at vclean+0x122
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports. Insufficient info makes it difficult to find and fix bugs.
ddb{3}>
--
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE