>Synopsis: Panic in kern_event.c using Rust async tcp sockets on multi-cpu
>system
>Category: kernel
>Environment:
System : OpenBSD 6.6
Details : OpenBSD 6.6-current (GENERIC.MP) #575: Mon Dec 30
04:47:45 MST 2019
[email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
Architecture: OpenBSD.amd64
Machine : amd64
>Description:
I was experimenting with some Rust async-std code when I managed to
trigger a panic on my laptop. I reproduced the same panic in a 2 cpu
VirtualBox instance hosted on macOS and the issue seems to be (to me)
a race condition as I haven't been able to reliable reproduce it in a
single core machine. (I tried a -current guest in vmm on my Lenovo
x270 to no avail.)
"fun" is the name of the offending rust program in the panic below,
which uses multiple threads to manage multiple TCP sockets. Basically
it's doing the following:
1. Spawning an event loop
2. Spawning multiple async listeners on different TCP ports
3. When a client connections, spawns another async handler for reading
data from the client connection.
4. Terminating the listener if the client reads a shutdown message
('die') signalling via an atomic boolean.
There's a timeout being used (100ms) that I put in to allow for seeing
the boolean change so the listening socket can shutdown.
Under the covers I believe the Rust async-std library is using kqueue
and setting sockets to non-blocking mode.
>From my serial console connection the panic and ddb output look like:
------
panic: kernel diagnostic assertion "kn->kn_status & KN_PROCESSING"
failed: file "/usr/src/sys/kern/kern_event.c", line 1015
Stopped at db_enter+0x10: popq %rbp
TID PID UID PRFLAGS PFLAGS CPU COMMAND
465238 49168 1000 0x100003 0 1 nc
*490511 60196 1000 0x3 0x4000000 0K fun
db_enter() at db_enter+0x10
panic(ffffffff81c5d8f2) at panic+0x128
__assert(ffffffff81cb3133,ffffffff81c79b29,3f7,ffffffff81c6ea37) at __assert+0x
2b
kqueue_scan(fffffd81e112b2d8,3e8,ec9fdc9d000,0,ffff8000ffff0798,ffff8000221e61d
8) at kqueue_scan+0x9ff
sys_kevent(ffff8000ffff0798,ffff8000221e6240,ffff8000221e62a0) at sys_kevent+0x
2a9
syscall(ffff8000221e6310) at syscall+0x389
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0xeca2e42da40, count: 8
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports. Insufficient info makes it difficult to find and fix bugs.
ddb{0}> trace
trace
db_enter() at db_enter+0x10
panic(ffffffff81c5d8f2) at panic+0x128
__assert(ffffffff81cb3133,ffffffff81c79b29,3f7,ffffffff81c6ea37) at __assert+0x
2b
kqueue_scan(fffffd81e112b2d8,3e8,ec9fdc9d000,0,ffff8000ffff0798,ffff8000221e61d
8) at kqueue_scan+0x9ff
sys_kevent(ffff8000ffff0798,ffff8000221e6240,ffff8000221e62a0) at sys_kevent+0x
2a9
syscall(ffff8000221e6310) at syscall+0x389
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0xeca2e42da40, count: -7
ddb{0}>
db_enter() at db_enter+0x10
end trace frame: 0xffff8000221e5da0, count: 0
ddb{0}> machine ddbcpu 0
machine ddbcpu 0
Invalid cpu 0
ddb{0}>
CPU not specified
ddb{0}> machine ddbcpu 1
machine ddbcpu 1
Stopped at x86_ipi_db+0x12: leave
x86_ipi_db(ffff800022010ff0) at x86_ipi_db+0x12
x86_ipi_handler() at x86_ipi_handler+0x80
Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23
_kernel_lock() at _kernel_lock+0xa9
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7ffffe81f0, count: 10
ddb{1}>
CPU not specified
ddb{1}> machine ddbcpu 2
machine ddbcpu 2
Invalid cpu 2
ddb{1}>
CPU not specified
ddb{1}> machine
machine
cpuinfo startcpu stopcpu ddbcpu acpi
ddb{1}>
CPU not specified
ddb{1}> machine cpuinfo
machine cpuinfo
0: stopped
* 1: ddb
ddb{1}>
0: stopped
* 1: ddb
ddb{1}> ps
ps
PID TID PPID UID S FLAGS WAIT COMMAND
51383 60254 32363 1000 3 0x100083 poll nc
32363 483771 33498 1000 3 0x10008b pause ksh
52751 213434 96560 1000 3 0x100083 poll nc
96560 392646 33498 1000 3 0x10008b pause ksh
*49168 465238 48098 1000 7 0x100003 nc
48098 274731 33498 1000 3 0x10008b pause ksh
59276 348353 24713 1000 3 0x100083 poll nc
24713 497442 33498 1000 3 0x10008b pause ksh
86121 486575 73285 1000 3 0x100083 poll nc
73285 346804 33498 1000 3 0x10008b pause ksh
84747 345032 98472 1000 3 0x100083 poll nc
73534 120988 53156 1000 3 0x100083 poll nc
60196 138251 9807 1000 3 0x83 fsleep fun
60196 230559 9807 1000 2 0x4000003 fun
60196 299224 9807 1000 2 0x4000003 fun
60196 490511 9807 1000 7 0x4000003 fun
60196 514631 9807 1000 3 0x4000083 fsleep fun
98472 350852 33498 1000 3 0x10008b pause ksh
53156 92298 33498 1000 3 0x10008b pause ksh
9807 488428 33498 1000 3 0x10008b pause ksh
33498 408845 1 1000 2 0x100000 tmux
10719 169841 78253 1000 3 0x100083 kqread tmux
78253 516809 35985 1000 3 0x10008b pause ksh
35985 490993 34312 1000 3 0x90 select sshd
34312 456339 26950 0 3 0x92 poll sshd
26179 522453 1 0 3 0x100083 ttyin getty
75271 6137 1 0 3 0x100083 ttyin getty
77364 87650 1 0 3 0x100083 ttyin getty
22047 206731 1 0 3 0x100083 ttyin getty
28805 5691 1 0 3 0x100083 ttyin getty
66108 188132 1 0 3 0x100098 poll cron
5616 401154 1 99 3 0x100090 poll sndiod
19570 276295 1 110 3 0x100090 poll sndiod
73915 68223 66398 95 3 0x100092 kqread smtpd
76313 504505 66398 103 3 0x100092 kqread smtpd
72220 522412 66398 95 3 0x100092 kqread smtpd
47619 319095 66398 95 3 0x100092 kqread smtpd
48905 440203 66398 95 3 0x100092 kqread smtpd
3780 161248 66398 95 3 0x100092 kqread smtpd
66398 437488 1 0 3 0x100080 kqread smtpd
26950 376937 1 0 3 0x80 select sshd
38443 307549 1 0 3 0x100080 poll ntpd
5019 498198 96567 83 3 0x100092 poll ntpd
96567 305954 1 83 3 0x100092 poll ntpd
48906 50341 69739 74 3 0x100092 bpf pflogd
69739 24998 1 0 3 0x80 netio pflogd
37271 400918 67137 73 3 0x100090 kqread syslogd
67137 395944 1 0 3 0x100082 netio syslogd
98604 332286 1 77 3 0x100090 poll dhclient
14816 154864 1 0 3 0x80 poll dhclient
12457 159696 91707 115 3 0x100092 kqread slaacd
34198 11917 91707 115 3 0x100092 kqread slaacd
91707 267613 1 0 3 0x100080 kqread slaacd
49906 284815 0 0 3 0x14200 pgzero zerothread
61278 167694 0 0 3 0x14200 aiodoned aiodoned
55003 170761 0 0 3 0x14200 syncer update
22637 329140 0 0 3 0x14200 cleaner cleaner
34567 16175 0 0 3 0x14200 reaper reaper
59841 103094 0 0 3 0x14200 pgdaemon pagedaemon
81674 442631 0 0 3 0x14200 bored crynlk
66359 485109 0 0 3 0x14200 bored crypto
95773 285761 0 0 3 0x14200 usbtsk usbtask
66082 192355 0 0 3 0x14200 usbatsk usbatsk
75164 409176 0 0 3 0x40014200 acpi0 acpi0
96710 48623 0 0 3 0x40014200 idle1
51380 227817 0 0 3 0x14200 bored softnet
71994 209188 0 0 3 0x14200 bored systqmp
72761 16933 0 0 3 0x14200 bored systq
12354 101692 0 0 3 0x40014200 bored softclock
97885 425400 0 0 3 0x40014200 idle0
42133 90808 0 0 3 0x14200 bored smr
1 176161 0 0 3 0x82 wait init
0 0 -1 0 3 0x10200 scheduler swapper
ddb{1}>
------
>How-To-Repeat:
I only have a tedious manual reproduction at the moment, but it
involves running the below Rust 1.40 program and using a handful of nc
instances to hold open client connections while trying to kill the
listening socket.
1. Install the rust port in -current
2. Clone https://github.com/voutilad/async-fun
3. Run "cargo build --release" (it seems to occur only in release builds)
4. Using something like tmux, start the program (./target/release/fun)
and then open multiple connections using netcat (nc localhost 9990).
5. Quickly send a die command (echo die | nc localhost 9990) and then
switch to another nc instance and hit enter/return
6. If (un)lucky, you'll induce a kernel panic
I've been able to reproduce it both on a Lenovo x270 as well as a
VirtualBox guest instance both running the most recent snapshots. (The
dmesg below is from the vbox guest on macOS which is a cleaner
install.)
>Fix:
(I have no idea :-)
dmesg:
OpenBSD 6.6-current (GENERIC.MP) #575: Mon Dec 30 04:47:45 MST 2019
[email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 8573091840 (8175MB)
avail mem = 8300802048 (7916MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xe1000 (10 entries)
bios0: vendor innotek GmbH version "VirtualBox" date 12/01/2006
bios0: innotek GmbH VirtualBox
acpi0 at bios0: ACPI 4.0
acpi0: sleep states S0 S5
acpi0: tables DSDT FACP APIC SSDT
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 3991.31 MHz, 06-3c-03
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,RDRAND,NXE,RDTSCP,LONG,LAHF,ABM,ITSC,FSGSBASE,AVX2,INVPCID,MD_CLEAR,L1DF,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: CPU supports MTRRs but not enabled by BIOS
cpu0: apic clock running at 1000MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 3991.05 MHz, 06-3c-03
cpu1:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,RDRAND,NXE,RDTSCP,LONG,LAHF,ABM,ITSC,FSGSBASE,AVX2,INVPCID,MD_CLEAR,L1DF,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins, remapped
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
acpipci0 at acpi0 PCI0: 0x00000000 0x00000011 0x00000001
acpiac0 at acpi0: AC unit online
acpivideo0 at acpi0: GFX0
cpu0: using VERW MDS workaround (except on vmm entry)
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
pciide0 at pci0 dev 1 function 1 "Intel 82371AB IDE" rev 0x01: DMA,
channel 0 configured to compatibility, channel 1 configured to
compatibility
wd0 at pciide0 channel 0 drive 0: <VBOX HARDDISK>
wd0: 128-sector PIO, LBA, 49152MB, 100663296 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
atapiscsi0 at pciide0 channel 1 drive 0
scsibus1 at atapiscsi0: 2 targets
cd0 at scsibus1 targ 0 lun 0: <VBOX, CD-ROM, 1.0> removable
cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
vga1 at pci0 dev 2 function 0 "InnoTek VirtualBox Graphics Adapter" rev 0x00
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
em0 at pci0 dev 3 function 0 "Intel 82540EM" rev 0x02: apic 2 int 19,
address 08:00:27:89:3c:30
"InnoTek VirtualBox Guest Service" rev 0x00 at pci0 dev 4 function 0
not configured
ohci0 at pci0 dev 6 function 0 "Apple Intrepid USB" rev 0x00: apic 2
int 22, version 1.0
piixpm0 at pci0 dev 7 function 0 "Intel 82371AB Power" rev 0x08: apic 2 int 23
iic0 at piixpm0
em1 at pci0 dev 8 function 0 "Intel 82540EM" rev 0x02: apic 2 int 16,
address 08:00:27:ee:98:65
ehci0 at pci0 dev 11 function 0 "Intel 82801FB USB" rev 0x00: apic 2 int 19
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "Intel EHCI root hub" rev
2.00/1.00 addr 1
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
usb1 at ohci0: USB revision 1.0
uhub1 at usb1 configuration 1 interface 0 "Apple OHCI root hub" rev
1.00/1.00 addr 1
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on wd0a (24921028cac3c2b1.a) swap on wd0b dump on wd0b
WARNING: / was not properly unmounted
usbdevs:
Controller /dev/usb0:
addr 01: 8086:0000 Intel, EHCI root hub
high speed, self powered, config 1, rev 1.00
driver: uhub0
Controller /dev/usb1:
addr 01: 106b:0000 Apple, OHCI root hub
full speed, self powered, config 1, rev 1.00
driver: uhub1