>Synopsis:      Panic in kern_event.c using Rust async tcp sockets on multi-cpu 
>system
>Category:      kernel
>Environment:
System      : OpenBSD 6.6
Details     : OpenBSD 6.6-current (GENERIC.MP) #575: Mon Dec 30
04:47:45 MST 2019
[email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine     : amd64
>Description:
I was experimenting with some Rust async-std code when I managed to
trigger a panic on my laptop. I reproduced the same panic in a 2 cpu
VirtualBox instance hosted on macOS and the issue seems to be (to me)
a race condition as I haven't been able to reliable reproduce it in a
single core machine. (I tried a -current guest in vmm on my Lenovo
x270 to no avail.)

"fun" is the name of the offending rust program in the panic below,
which uses multiple threads to manage multiple TCP sockets. Basically
it's doing the following:

1. Spawning an event loop
2. Spawning multiple async listeners on different TCP ports
3. When a client connections, spawns another async handler for reading
data from the client connection.
4. Terminating the listener if the client reads a shutdown message
('die') signalling via an atomic boolean.

There's a timeout being used (100ms) that I put in to allow for seeing
the boolean change so the listening socket can shutdown.

Under the covers I believe the Rust async-std library is using kqueue
and setting sockets to non-blocking mode.

>From my serial console connection the panic and ddb output look like:

------
panic: kernel diagnostic assertion "kn->kn_status & KN_PROCESSING"
failed: file "/usr/src/sys/kern/kern_event.c", line 1015
Stopped at      db_enter+0x10:  popq    %rbp
    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
 465238  49168   1000    0x100003          0    1  nc
*490511  60196   1000         0x3  0x4000000    0K fun
db_enter() at db_enter+0x10
panic(ffffffff81c5d8f2) at panic+0x128
__assert(ffffffff81cb3133,ffffffff81c79b29,3f7,ffffffff81c6ea37) at __assert+0x
2b
kqueue_scan(fffffd81e112b2d8,3e8,ec9fdc9d000,0,ffff8000ffff0798,ffff8000221e61d
8) at kqueue_scan+0x9ff
sys_kevent(ffff8000ffff0798,ffff8000221e6240,ffff8000221e62a0) at sys_kevent+0x
2a9
syscall(ffff8000221e6310) at syscall+0x389
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0xeca2e42da40, count: 8
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{0}> trace
trace
db_enter() at db_enter+0x10
panic(ffffffff81c5d8f2) at panic+0x128
__assert(ffffffff81cb3133,ffffffff81c79b29,3f7,ffffffff81c6ea37) at __assert+0x
2b
kqueue_scan(fffffd81e112b2d8,3e8,ec9fdc9d000,0,ffff8000ffff0798,ffff8000221e61d
8) at kqueue_scan+0x9ff
sys_kevent(ffff8000ffff0798,ffff8000221e6240,ffff8000221e62a0) at sys_kevent+0x
2a9
syscall(ffff8000221e6310) at syscall+0x389
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0xeca2e42da40, count: -7
ddb{0}>
db_enter() at db_enter+0x10
end trace frame: 0xffff8000221e5da0, count: 0
ddb{0}> machine ddbcpu 0
machine ddbcpu 0
Invalid cpu 0
ddb{0}>
CPU not specified
ddb{0}> machine ddbcpu 1
machine ddbcpu 1
Stopped at      x86_ipi_db+0x12:        leave
x86_ipi_db(ffff800022010ff0) at x86_ipi_db+0x12
x86_ipi_handler() at x86_ipi_handler+0x80
Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23
_kernel_lock() at _kernel_lock+0xa9
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7ffffe81f0, count: 10
ddb{1}>
CPU not specified
ddb{1}> machine ddbcpu 2
machine ddbcpu 2
Invalid cpu 2
ddb{1}>
CPU not specified
ddb{1}> machine
machine
cpuinfo     startcpu    stopcpu     ddbcpu      acpi
ddb{1}>
CPU not specified
ddb{1}> machine cpuinfo
machine cpuinfo
    0: stopped
*   1: ddb
ddb{1}>
    0: stopped
*   1: ddb
ddb{1}> ps
ps
   PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
 51383   60254  32363   1000  3    0x100083  poll          nc
 32363  483771  33498   1000  3    0x10008b  pause         ksh
 52751  213434  96560   1000  3    0x100083  poll          nc
 96560  392646  33498   1000  3    0x10008b  pause         ksh
*49168  465238  48098   1000  7    0x100003                nc
 48098  274731  33498   1000  3    0x10008b  pause         ksh
 59276  348353  24713   1000  3    0x100083  poll          nc
 24713  497442  33498   1000  3    0x10008b  pause         ksh
 86121  486575  73285   1000  3    0x100083  poll          nc
 73285  346804  33498   1000  3    0x10008b  pause         ksh
 84747  345032  98472   1000  3    0x100083  poll          nc
 73534  120988  53156   1000  3    0x100083  poll          nc
 60196  138251   9807   1000  3        0x83  fsleep        fun
 60196  230559   9807   1000  2   0x4000003                fun
 60196  299224   9807   1000  2   0x4000003                fun
 60196  490511   9807   1000  7   0x4000003                fun
 60196  514631   9807   1000  3   0x4000083  fsleep        fun
 98472  350852  33498   1000  3    0x10008b  pause         ksh
 53156   92298  33498   1000  3    0x10008b  pause         ksh
  9807  488428  33498   1000  3    0x10008b  pause         ksh
 33498  408845      1   1000  2    0x100000                tmux
 10719  169841  78253   1000  3    0x100083  kqread        tmux
  78253  516809  35985   1000  3    0x10008b  pause         ksh
 35985  490993  34312   1000  3        0x90  select        sshd
 34312  456339  26950      0  3        0x92  poll          sshd
 26179  522453      1      0  3    0x100083  ttyin         getty
 75271    6137      1      0  3    0x100083  ttyin         getty
 77364   87650      1      0  3    0x100083  ttyin         getty
 22047  206731      1      0  3    0x100083  ttyin         getty
 28805    5691      1      0  3    0x100083  ttyin         getty
 66108  188132      1      0  3    0x100098  poll          cron
  5616  401154      1     99  3    0x100090  poll          sndiod
 19570  276295      1    110  3    0x100090  poll          sndiod
 73915   68223  66398     95  3    0x100092  kqread        smtpd
 76313  504505  66398    103  3    0x100092  kqread        smtpd
 72220  522412  66398     95  3    0x100092  kqread        smtpd
 47619  319095  66398     95  3    0x100092  kqread        smtpd
 48905  440203  66398     95  3    0x100092  kqread        smtpd
  3780  161248  66398     95  3    0x100092  kqread        smtpd
 66398  437488      1      0  3    0x100080  kqread        smtpd
 26950  376937      1      0  3        0x80  select        sshd
 38443  307549      1      0  3    0x100080  poll          ntpd
  5019  498198  96567     83  3    0x100092  poll          ntpd
 96567  305954      1     83  3    0x100092  poll          ntpd
 48906   50341  69739     74  3    0x100092  bpf           pflogd
 69739   24998      1      0  3        0x80  netio         pflogd
 37271  400918  67137     73  3    0x100090  kqread        syslogd
 67137  395944      1      0  3    0x100082  netio         syslogd
 98604  332286      1     77  3    0x100090  poll          dhclient
 14816  154864      1      0  3        0x80  poll          dhclient
 12457  159696  91707    115  3    0x100092  kqread        slaacd
 34198   11917  91707    115  3    0x100092  kqread        slaacd
 91707  267613      1      0  3    0x100080  kqread        slaacd
 49906  284815      0      0  3     0x14200  pgzero        zerothread
 61278  167694      0      0  3     0x14200  aiodoned      aiodoned
 55003  170761      0      0  3     0x14200  syncer        update
 22637  329140      0      0  3     0x14200  cleaner       cleaner
 34567   16175      0      0  3     0x14200  reaper        reaper
 59841  103094      0      0  3     0x14200  pgdaemon      pagedaemon
 81674  442631      0      0  3     0x14200  bored         crynlk
 66359  485109      0      0  3     0x14200  bored         crypto
 95773  285761      0      0  3     0x14200  usbtsk        usbtask
 66082  192355      0      0  3     0x14200  usbatsk       usbatsk
 75164  409176      0      0  3  0x40014200  acpi0         acpi0
 96710   48623      0      0  3  0x40014200                idle1
 51380  227817      0      0  3     0x14200  bored         softnet
 71994  209188      0      0  3     0x14200  bored         systqmp
 72761   16933      0      0  3     0x14200  bored         systq
 12354  101692      0      0  3  0x40014200  bored         softclock
 97885  425400      0      0  3  0x40014200                idle0
 42133   90808      0      0  3     0x14200  bored         smr
     1  176161      0      0  3        0x82  wait          init
     0       0     -1      0  3     0x10200  scheduler     swapper
ddb{1}>
------


>How-To-Repeat:
I only have a tedious manual reproduction at the moment, but it
involves running the below Rust 1.40 program and using a handful of nc
instances to hold open client connections while trying to kill the
listening socket.

1. Install the rust port in -current
2. Clone https://github.com/voutilad/async-fun
3. Run "cargo build --release" (it seems to occur only in release builds)
4. Using something like tmux, start the program (./target/release/fun)
and then open multiple connections using netcat (nc localhost 9990).
5. Quickly send a die command (echo die | nc localhost 9990) and then
switch to another nc instance and hit enter/return
6. If (un)lucky, you'll induce a kernel panic

I've been able to reproduce it both on a Lenovo x270 as well as a
VirtualBox guest instance both running the most recent snapshots. (The
dmesg below is from the vbox guest on macOS which is a cleaner
install.)

>Fix:
(I have no idea :-)


dmesg:
OpenBSD 6.6-current (GENERIC.MP) #575: Mon Dec 30 04:47:45 MST 2019
    [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 8573091840 (8175MB)
avail mem = 8300802048 (7916MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xe1000 (10 entries)
bios0: vendor innotek GmbH version "VirtualBox" date 12/01/2006
bios0: innotek GmbH VirtualBox
acpi0 at bios0: ACPI 4.0
acpi0: sleep states S0 S5
acpi0: tables DSDT FACP APIC SSDT
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 3991.31 MHz, 06-3c-03
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,RDRAND,NXE,RDTSCP,LONG,LAHF,ABM,ITSC,FSGSBASE,AVX2,INVPCID,MD_CLEAR,L1DF,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: CPU supports MTRRs but not enabled by BIOS
cpu0: apic clock running at 1000MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 3991.05 MHz, 06-3c-03
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,RDRAND,NXE,RDTSCP,LONG,LAHF,ABM,ITSC,FSGSBASE,AVX2,INVPCID,MD_CLEAR,L1DF,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins, remapped
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
acpipci0 at acpi0 PCI0: 0x00000000 0x00000011 0x00000001
acpiac0 at acpi0: AC unit online
acpivideo0 at acpi0: GFX0
cpu0: using VERW MDS workaround (except on vmm entry)
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
pciide0 at pci0 dev 1 function 1 "Intel 82371AB IDE" rev 0x01: DMA,
channel 0 configured to compatibility, channel 1 configured to
compatibility
wd0 at pciide0 channel 0 drive 0: <VBOX HARDDISK>
wd0: 128-sector PIO, LBA, 49152MB, 100663296 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
atapiscsi0 at pciide0 channel 1 drive 0
scsibus1 at atapiscsi0: 2 targets
cd0 at scsibus1 targ 0 lun 0: <VBOX, CD-ROM, 1.0> removable
cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
vga1 at pci0 dev 2 function 0 "InnoTek VirtualBox Graphics Adapter" rev 0x00
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
em0 at pci0 dev 3 function 0 "Intel 82540EM" rev 0x02: apic 2 int 19,
address 08:00:27:89:3c:30
"InnoTek VirtualBox Guest Service" rev 0x00 at pci0 dev 4 function 0
not configured
ohci0 at pci0 dev 6 function 0 "Apple Intrepid USB" rev 0x00: apic 2
int 22, version 1.0
piixpm0 at pci0 dev 7 function 0 "Intel 82371AB Power" rev 0x08: apic 2 int 23
iic0 at piixpm0
em1 at pci0 dev 8 function 0 "Intel 82540EM" rev 0x02: apic 2 int 16,
address 08:00:27:ee:98:65
ehci0 at pci0 dev 11 function 0 "Intel 82801FB USB" rev 0x00: apic 2 int 19
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "Intel EHCI root hub" rev
2.00/1.00 addr 1
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
usb1 at ohci0: USB revision 1.0
uhub1 at usb1 configuration 1 interface 0 "Apple OHCI root hub" rev
1.00/1.00 addr 1
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on wd0a (24921028cac3c2b1.a) swap on wd0b dump on wd0b
WARNING: / was not properly unmounted

usbdevs:
Controller /dev/usb0:
addr 01: 8086:0000 Intel, EHCI root hub
high speed, self powered, config 1, rev 1.00
driver: uhub0
Controller /dev/usb1:
addr 01: 106b:0000 Apple, OHCI root hub
full speed, self powered, config 1, rev 1.00
driver: uhub1

Reply via email to