Hi all,
I've got a couple of VMs that I update quite frequently. I use a
script to automate the update (in place untarring of the various sets)
on each VM, and have another script on the host that iterates over the
VMs and upgrades each sequentially, orchestrating the process.
After the orchestrator script has completed the upgrade, it sshs in
and runs `doas reboot`: /usr/bin/ssh -t ${VM} doas reboot
Recently (last few weeks, I haven't really tracked this, sorry) more
often than not, I'm seeing panics because init gets SIGBUS:
syncing disks... done
panic: init died (signal 10, exit 0)
Stopped at db_enter+0x10: popq %rbp
TID PID UID PRFLAGS PFLAGS CPU COMMAND
*475527 1 0 0x802 0x2000 0 init
db_enter() at db_enter+0x10
panic(ffffffff81e6fc03) at panic+0xb8
exit1(ffff8000fffffa40,0,a,1) at exit1+0x61d
trapsignal(ffff8000fffffa40,a,6,3,a7d799f19e0) at trapsignal+0x158
upageflttrap(ffff800014c96730,a7d799f19e0) at upageflttrap+0xf0
usertrap(ffff800014c96730) at usertrap+0x179
recall_trap() at recall_trap+0x8
end of kernel
end trace frame: 0x7f7ffffc1280, count: 8
https://www.openbsd.org/ddb.html describes the minimum info required
in bug
reports. Insufficient info makes it difficult to find and fix bugs.
ddb> ps
PID TID PPID UID S FLAGS WAIT COMMAND
23914 217005 1 0 2 0x3 reboot
85576 149845 0 0 3 0x14280 nfsidl nfsio
85516 509272 0 0 3 0x14280 nfsidl nfsio
27313 82613 0 0 3 0x14280 nfsidl nfsio
74778 248263 0 0 3 0x14280 nfsidl nfsio
46983 166023 0 0 3 0x14200 bored smr
68429 136470 0 0 2 0x14200 zerothread
86312 246770 0 0 3 0x14200 aiodoned aiodoned
45749 87877 0 0 2 0x14600 update
67717 192012 0 0 3 0x14200 cleaner cleaner
15199 366441 0 0 3 0x14200 reaper reaper
35818 112882 0 0 3 0x14200 pgdaemon pagedaemon
47013 345768 0 0 3 0x14200 bored crynlk
74571 370572 0 0 3 0x14200 bored crypto
69117 88807 0 0 2 0x14200 softnet
53953 240257 0 0 2 0x14200 systqmp
53101 304322 0 0 3 0x14200 bored systq
46072 394366 0 0 3 0x40014200 bored softclock
59261 410646 0 0 3 0x40014200 idle0
* 1 475527 0 0 7 0x2802 init
0 0 -1 0 3 0x10200 scheduler swapper
ddb>
With some printf debugging, I've determined that this happens in
if_downall() (from the trace above, vfs_shutdown() has just completed,
a printf after resettodr() was shown with a debugging kernel, but the
printf after if_downall() didn't.
I'll dig around a bit further, but if anyone has anything obvious I
should look into, I'm keen to hear it.
Thanks,
Paul
--
>++++++++[<++++++++++>-]<+++++++.>+++[<------>-]<.>+++[<+
+++++++++++>-]<.>++[<------------>-]<+.--------------.[-]
http://www.weirdnet.nl/