> Date: Mon, 19 May 2025 00:02:38 +0200 > From: Martin Pieuchot <m...@grenadille.net> > > On 18/05/25(Sun) 23:43, Mark Kettenis wrote: > > > From: Dave Voutila <d...@sisu.io> > > > Date: Sun, 18 May 2025 15:21:28 -0400 > > > > > > Claudio Jeker <cje...@diehard.n-r-g.com> writes: > > > > > > > On Thu, May 15, 2025 at 01:15:43PM +0200, Martin Pieuchot wrote: > > > >> Fault below has been triggered by dong 'vmctl $myvm stop' while > > > >> rebooting a 7.7 amd64 VM: > > > >> > > > >> OpenBSD 7.7 (GENERIC) #619: Sun Apr 13 08:19:34 MDT 2025 > > > >> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC > > > >> real mem = 117428224 (111MB) > > > >> avail mem = 87916544 (83MB) > > > >> random: good seed from bootblocks > > > >> mpath0 at root > > > >> scsibus0 at mpath0: 256 targets > > > >> mainbus0 at root > > > >> bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf2760 (10 entries) > > > >> bios0: vendor SeaBIOS version "1.16.3p0-OpenBSD-vmm" date 01/01/2011 > > > >> bios0: OpenBSD VMM > > > >> acpi at bios0 not configured > > > >> cpu0 at mainbus0: (uniprocessor) > > > >> cpu0: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz, 2394.43 MHz, 06-3d-04 > > > >> cpu0: cpuid 1 > > > >> edx=78ba97f<FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2> > > > >> ecx=f6d83203<SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV> > > > >> cpu0: cpuid 7.0 > > > >> ebx=1c23a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,RDSEED,ADX,SMAP> > > > >> edx=400<MD_CLEAR> > > > >> cpu0: cpuid d.1 eax=1<XSAVEOPT> > > > >> cpu0: cpuid 80000001 edx=24100800<NXE,PAGE1GB,LONG> > > > >> ecx=121<LAHF,ABM,3DNOWP> > > > >> cpu0: cpuid 80000007 edx=100<ITSC> > > > >> cpu0: MELTDOWN > > > >> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB > > > >> 64b/line 8-way L2 cache, 4MB 64b/line 16-way L3 cache > > > >> cpu0: smt 0, core 0, package 0 > > > >> cpu0: using VERW MDS workaround > > > >> pvbus0 at mainbus0: OpenBSD > > > >> pvclock0 at pvbus0 > > > >> pci0 at mainbus0 bus 0 > > > >> pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00 > > > >> virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00 > > > >> viornd0 at virtio0 > > > >> virtio0: irq 3 > > > >> virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Network" rev 0x00 > > > >> vio0 at virtio1: 1 queue, address fe:e1:bb:d1:de:21 > > > >> virtio1: irq 5 > > > >> virtio2 at pci0 dev 3 function 0 "Qumranet Virtio Storage" rev 0x00 > > > >> vioblk0 at virtio2 > > > >> virtio2: irq 6 > > > >> scsibus1 at vioblk0: 1 targets > > > >> sd0 at scsibus1 targ 0 lun 0: <VirtIO, Block Device, > > > > >> sd0: 3072MB, 512 bytes/sector, 6291456 sectors > > > >> virtio3 at pci0 dev 4 function 0 "OpenBSD VMM Control" rev 0x00 > > > >> vmmci0 at virtio3 > > > >> virtio3: irq 7 > > > >> isa0 at mainbus0 > > > >> isadma0 at isa0 > > > >> com0 at isa0 port 0x3f8/8 irq 4: ns8250, no fifo > > > >> com0: console > > > >> Rebooting in response to request from vmmci0 host > > > >> uvm_fault(0xffffffff82a56848, 0xf0, 0, 1) -> e > > > >> kernel: page fault trap, code=0 > > > >> Stopped at mtx_enter+0x2f: movq 0(%rdi),%rax > > > >> TID PID UID PRFLAGS PFLAGS CPU COMMAND > > > >> * 0 0 0 0x10000 0x200 0 swapper > > > >> mtx_enter(f0,f0,cea0d74fd2320eb6,f0,10,ffffffff82fbdcc0) at > > > >> mtx_enter+0x2f > > > >> prsignal(0,2,2,1,ffff800000090400,d) at prsignal+0x26 > > > >> vmmci_config_change(ffff800000090400,ffff800000090400,86acf72f78ea7d07,0,ffff80 > > > >> 0000090400,0) at vmmci_config_change+0xf6 > > > >> virtio_pci_legacy_intr(ffff800000090400,ffff800000090400,c9a59ef2495be15f,fffff > > > >> fff81004000,0,4) at virtio_pci_legacy_intr+0x63 > > > >> intr_handler(ffffffff82fbddf0,ffff80000008e500,cbae99a56f027dfe,ffff80000008e48 > > > >> 0,ffffffff8152e8b6,ffffffff82fbdde0) at intr_handler+0x56 > > > >> Xintr_legacy7_untramp(0,ffffffff824e65b0,0,18041969,ffffffff81004000,e) > > > >> at Xint > > > >> r_legacy7_untramp+0x1a3 > > > >> Xspllower(0,0,c0d762ae039e8784,ffffffff81004000,ffffffff8152ed5c,ffffffff82fb70 > > > >> 08) at Xspllower+0x1d > > > >> cpu_configure(4449abb23155d965,0,0,ffff800000032000,ffffffff812a613f,ffffffff82 > > > >> fbdf10) at cpu_configure+0xaf > > > >> main(e,e,0,1,ffffffff816ab787,ffffffff82fbdf40) at main+0x427 > > > >> end trace frame: 0x0, count: 6 > > > >> https://www.openbsd.org/ddb.html describes the minimum info required > > > >> in bug > > > >> > > > > > > > > I guess initprocess is still NULL in pvbus_reboot() and so prsignal > > > > blows > > > > up trying to access the ps_mtx. > > > > > > > > I would suggest to use startuphook_establish() to delay the attach of > > > > vmmci0 until initprocess is set. > > > > > > So here's what I'm thinking... > > > > > > I first tried delaying the virtio_attach_finish call using > > > startuphook_establish(9) but there seems to be a lifetime issue with the > > > virtio_attach_args struct. Specifically, virtio_pci_attach_finish > > > dereferences it as a virtio_pci_attach_args. At this point, the > > > pci_attach_args* member vpa_pa seems trashed. > > > > Yes, as soon as you return from the attach function, the attach_args > > struct will be freed. You can make a copy of the struct and stash it > > away for later. But even then you have to be careful that pointer in > > the struct may no longer be valid. > > > > > Instead, I took the approach of using kthread_create_deferred(9) to > > > queue up signalling the shutdown or reboot command. If called late > > > enough, this ends up a direct call of the callback function instead of > > > adding it to the deferred kthread queue. > > > > > > In my testing, it can still result in the signal hitting initprocess > > > after it's alive, but before it can handle it. If that happens, it ends > > > up not causing a shutdown / reboot. To deal with this and the fact that > > > vmd thinks the command is in flight, I arm a timeout to try again after > > > 5 seconds. > > > > > > It kinda sucks, but it works. I'm all ears as to another approach or if > > > someone has a clue as to the lifetime issue I'm seeing with deferring > > > virtio pci attachment. > > > > I wonder if it would make sense to have a more generic solution. I > > suspect a similar issue may arise if you press the (ACPI) powerbutton > > on a machine during boot. I think an API, that can be safely called > > from interrupt context, would make sense. Currently we have several > > ssubtly different implementations of the same code, at least for > > shutdown, that all have one or more bugs... > > Are you suggesting using something in-kernel instead of sending a signal > to the init process which might not yet exist?
Nah. All I'm saying is that there should be a single function in the kernel that drivers can call to do a clean shutdown (and another one to do a reboot). It should do all the magic, including a check on initproc before sending the signal. But maybe having the kernel call boot() directly instead of sending a signal if initproc is NULL is a reasonable way to handle this situation. But I'd still arguing for separating that logic out in a separate function and having all drivers that currently call prsignal(initproc, ...) call that function. > > > diff refs/heads/master refs/heads/vmmci-panic > > > commit - 3c994bb556f544ca04982fd977c153c1485aab1e > > > commit + c58acacf15c284967253b4196b7d24aedf9ad1a2 > > > blob - 984626393cd85a97b6b84a19c82c8175aa2e4379 > > > blob + 0a4eac6d9ed700c383d332e5360bf0aea0a464de > > > --- sys/dev/pv/vmmci.c > > > +++ sys/dev/pv/vmmci.c > > > @@ -21,6 +21,7 @@ > > > #include <sys/timeout.h> > > > #include <sys/device.h> > > > #include <sys/sensors.h> > > > +#include <sys/kthread.h> > > > > > > #include <machine/bus.h> > > > > > > @@ -39,10 +40,12 @@ struct vmmci_softc { > > > struct device sc_dev; > > > struct virtio_softc *sc_virtio; > > > enum vmmci_cmd sc_cmd; > > > + enum vmmci_cmd sc_cmd_async; > > > unsigned int sc_interval; > > > struct ksensordev sc_sensordev; > > > struct ksensor sc_sensor; > > > struct timeout sc_tick; > > > + struct timeout sc_to_shutdown; > > > }; > > > > > > int vmmci_match(struct device *, void *, void *); > > > @@ -52,6 +55,7 @@ int vmmci_activate(struct device *, int); > > > int vmmci_config_change(struct virtio_softc *); > > > void vmmci_tick(void *); > > > void vmmci_tick_hook(struct device *); > > > +void vmmci_worker(void *); > > > > > > const struct cfattach vmmci_ca = { > > > sizeof(struct vmmci_softc), > > > @@ -116,6 +120,8 @@ vmmci_attach(struct device *parent, struct device *sel > > > config_mountroot(self, vmmci_tick_hook); > > > } > > > > > > + timeout_set(&sc->sc_to_shutdown, vmmci_worker, sc); > > > + > > > printf("\n"); > > > if (virtio_attach_finish(vsc, va) != 0) > > > goto err; > > > @@ -171,18 +177,22 @@ vmmci_config_change(struct virtio_softc *vsc) > > > /* no action */ > > > break; > > > case VMMCI_SHUTDOWN: > > > - pvbus_shutdown(&sc->sc_dev); > > > - break; > > > case VMMCI_REBOOT: > > > - pvbus_reboot(&sc->sc_dev); > > > + /* > > > + * Asynchronously process shutdown/reboot to allow us to handle > > > + * when vmmci(4) receives a command during system startup or > > > + * shutdown. > > > + */ > > > + sc->sc_cmd_async = cmd; > > > + kthread_create_deferred(vmmci_worker, sc); > > > break; > > > case VMMCI_SYNCRTC: > > > inittodr(gettime()); > > > sc->sc_cmd = VMMCI_NONE; > > > - break; > > > + break; > > > default: > > > printf("%s: invalid command %d\n", sc->sc_dev.dv_xname, cmd); > > > - cmd = VMMCI_NONE; > > > + cmd = VMMCI_NONE; > > > break; > > > } > > > > > > @@ -226,3 +236,25 @@ vmmci_tick_hook(struct device *self) > > > timeout_set(&sc->sc_tick, vmmci_tick, sc); > > > vmmci_tick(sc); > > > } > > > + > > > +void > > > +vmmci_worker(void *arg) > > > +{ > > > + struct vmmci_softc *sc = (struct vmmci_softc *)arg; > > > + > > > + switch (sc->sc_cmd_async) { > > > + case VMMCI_SHUTDOWN: > > > + pvbus_shutdown(&sc->sc_dev); > > > + break; > > > + case VMMCI_REBOOT: > > > + pvbus_reboot(&sc->sc_dev); > > > + break; > > > + default: > > > + printf("%s: invalid async command %d\n", sc->sc_dev.dv_xname, > > > + sc->sc_cmd_async); > > > + return; > > > + } > > > + > > > + /* Re-arm in case initprocess misses our signal. */ > > > + timeout_add_sec(&sc->sc_to_shutdown, 5); > > > +} > > > > > > > > > > >