On 18/05/25(Sun) 23:43, Mark Kettenis wrote:
> > From: Dave Voutila <d...@sisu.io>
> > Date: Sun, 18 May 2025 15:21:28 -0400
> > 
> > Claudio Jeker <cje...@diehard.n-r-g.com> writes:
> > 
> > > On Thu, May 15, 2025 at 01:15:43PM +0200, Martin Pieuchot wrote:
> > >> Fault below has been triggered by dong 'vmctl $myvm stop' while
> > >> rebooting a 7.7 amd64 VM:
> > >>
> > >> OpenBSD 7.7 (GENERIC) #619: Sun Apr 13 08:19:34 MDT 2025
> > >>     dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
> > >> real mem = 117428224 (111MB)
> > >> avail mem = 87916544 (83MB)
> > >> random: good seed from bootblocks
> > >> mpath0 at root
> > >> scsibus0 at mpath0: 256 targets
> > >> mainbus0 at root
> > >> bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf2760 (10 entries)
> > >> bios0: vendor SeaBIOS version "1.16.3p0-OpenBSD-vmm" date 01/01/2011
> > >> bios0: OpenBSD VMM
> > >> acpi at bios0 not configured
> > >> cpu0 at mainbus0: (uniprocessor)
> > >> cpu0: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz, 2394.43 MHz, 06-3d-04
> > >> cpu0: cpuid 1
> > >> edx=78ba97f<FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2>
> > >> ecx=f6d83203<SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV>
> > >> cpu0: cpuid 7.0 
> > >> ebx=1c23a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,RDSEED,ADX,SMAP> 
> > >> edx=400<MD_CLEAR>
> > >> cpu0: cpuid d.1 eax=1<XSAVEOPT>
> > >> cpu0: cpuid 80000001 edx=24100800<NXE,PAGE1GB,LONG> 
> > >> ecx=121<LAHF,ABM,3DNOWP>
> > >> cpu0: cpuid 80000007 edx=100<ITSC>
> > >> cpu0: MELTDOWN
> > >> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
> > >> 64b/line 8-way L2 cache, 4MB 64b/line 16-way L3 cache
> > >> cpu0: smt 0, core 0, package 0
> > >> cpu0: using VERW MDS workaround
> > >> pvbus0 at mainbus0: OpenBSD
> > >> pvclock0 at pvbus0
> > >> pci0 at mainbus0 bus 0
> > >> pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00
> > >> virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00
> > >> viornd0 at virtio0
> > >> virtio0: irq 3
> > >> virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Network" rev 0x00
> > >> vio0 at virtio1: 1 queue, address fe:e1:bb:d1:de:21
> > >> virtio1: irq 5
> > >> virtio2 at pci0 dev 3 function 0 "Qumranet Virtio Storage" rev 0x00
> > >> vioblk0 at virtio2
> > >> virtio2: irq 6
> > >> scsibus1 at vioblk0: 1 targets
> > >> sd0 at scsibus1 targ 0 lun 0: <VirtIO, Block Device, >
> > >> sd0: 3072MB, 512 bytes/sector, 6291456 sectors
> > >> virtio3 at pci0 dev 4 function 0 "OpenBSD VMM Control" rev 0x00
> > >> vmmci0 at virtio3
> > >> virtio3: irq 7
> > >> isa0 at mainbus0
> > >> isadma0 at isa0
> > >> com0 at isa0 port 0x3f8/8 irq 4: ns8250, no fifo
> > >> com0: console
> > >> Rebooting in response to request from vmmci0 host
> > >> uvm_fault(0xffffffff82a56848, 0xf0, 0, 1) -> e
> > >> kernel: page fault trap, code=0
> > >> Stopped at      mtx_enter+0x2f: movq    0(%rdi),%rax
> > >>     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > >> *     0      0      0     0x10000      0x200    0  swapper
> > >> mtx_enter(f0,f0,cea0d74fd2320eb6,f0,10,ffffffff82fbdcc0) at 
> > >> mtx_enter+0x2f
> > >> prsignal(0,2,2,1,ffff800000090400,d) at prsignal+0x26
> > >> vmmci_config_change(ffff800000090400,ffff800000090400,86acf72f78ea7d07,0,ffff80
> > >> 0000090400,0) at vmmci_config_change+0xf6
> > >> virtio_pci_legacy_intr(ffff800000090400,ffff800000090400,c9a59ef2495be15f,fffff
> > >> fff81004000,0,4) at virtio_pci_legacy_intr+0x63
> > >> intr_handler(ffffffff82fbddf0,ffff80000008e500,cbae99a56f027dfe,ffff80000008e48
> > >> 0,ffffffff8152e8b6,ffffffff82fbdde0) at intr_handler+0x56
> > >> Xintr_legacy7_untramp(0,ffffffff824e65b0,0,18041969,ffffffff81004000,e) 
> > >> at Xint
> > >> r_legacy7_untramp+0x1a3
> > >> Xspllower(0,0,c0d762ae039e8784,ffffffff81004000,ffffffff8152ed5c,ffffffff82fb70
> > >> 08) at Xspllower+0x1d
> > >> cpu_configure(4449abb23155d965,0,0,ffff800000032000,ffffffff812a613f,ffffffff82
> > >> fbdf10) at cpu_configure+0xaf
> > >> main(e,e,0,1,ffffffff816ab787,ffffffff82fbdf40) at main+0x427
> > >> end trace frame: 0x0, count: 6
> > >> https://www.openbsd.org/ddb.html describes the minimum info required in 
> > >> bug
> > >>
> > >
> > > I guess initprocess is still NULL in pvbus_reboot() and so prsignal blows
> > > up trying to access the ps_mtx.
> > >
> > > I would suggest to use startuphook_establish() to delay the attach of
> > > vmmci0 until initprocess is set.
> > 
> > So here's what I'm thinking...
> > 
> > I first tried delaying the virtio_attach_finish call using
> > startuphook_establish(9) but there seems to be a lifetime issue with the
> > virtio_attach_args struct. Specifically, virtio_pci_attach_finish
> > dereferences it as a virtio_pci_attach_args. At this point, the
> > pci_attach_args* member vpa_pa seems trashed.
> 
> Yes, as soon as you return from the attach function, the attach_args
> struct will be freed.  You can make a copy of the struct and stash it
> away for later.  But even then you have to be careful that pointer in
> the struct may no longer be valid.
> 
> > Instead, I took the approach of using kthread_create_deferred(9) to
> > queue up signalling the shutdown or reboot command. If called late
> > enough, this ends up a direct call of the callback function instead of
> > adding it to the deferred kthread queue.
> > 
> > In my testing, it can still result in the signal hitting initprocess
> > after it's alive, but before it can handle it. If that happens, it ends
> > up not causing a shutdown / reboot. To deal with this and the fact that
> > vmd thinks the command is in flight, I arm a timeout to try again after
> > 5 seconds.
> > 
> > It kinda sucks, but it works. I'm all ears as to another approach or if
> > someone has a clue as to the lifetime issue I'm seeing with deferring
> > virtio pci attachment.
> 
> I wonder if it would make sense to have a more generic solution.  I
> suspect a similar issue may arise if you press the (ACPI) powerbutton
> on a machine during boot.  I think an API, that can be safely called
> from interrupt context, would make sense.  Currently we have several
> ssubtly different implementations of the same code, at least for
> shutdown, that all have one or more bugs...

Are you suggesting using something in-kernel instead of sending a signal
to the init process which might not yet exist?

> > diff refs/heads/master refs/heads/vmmci-panic
> > commit - 3c994bb556f544ca04982fd977c153c1485aab1e
> > commit + c58acacf15c284967253b4196b7d24aedf9ad1a2
> > blob - 984626393cd85a97b6b84a19c82c8175aa2e4379
> > blob + 0a4eac6d9ed700c383d332e5360bf0aea0a464de
> > --- sys/dev/pv/vmmci.c
> > +++ sys/dev/pv/vmmci.c
> > @@ -21,6 +21,7 @@
> >  #include <sys/timeout.h>
> >  #include <sys/device.h>
> >  #include <sys/sensors.h>
> > +#include <sys/kthread.h>
> > 
> >  #include <machine/bus.h>
> > 
> > @@ -39,10 +40,12 @@ struct vmmci_softc {
> >     struct device            sc_dev;
> >     struct virtio_softc     *sc_virtio;
> >     enum vmmci_cmd           sc_cmd;
> > +   enum vmmci_cmd           sc_cmd_async;
> >     unsigned int             sc_interval;
> >     struct ksensordev        sc_sensordev;
> >     struct ksensor           sc_sensor;
> >     struct timeout           sc_tick;
> > +   struct timeout           sc_to_shutdown;
> >  };
> > 
> >  int        vmmci_match(struct device *, void *, void *);
> > @@ -52,6 +55,7 @@ int       vmmci_activate(struct device *, int);
> >  int        vmmci_config_change(struct virtio_softc *);
> >  void       vmmci_tick(void *);
> >  void       vmmci_tick_hook(struct device *);
> > +void       vmmci_worker(void *);
> > 
> >  const struct cfattach vmmci_ca = {
> >     sizeof(struct vmmci_softc),
> > @@ -116,6 +120,8 @@ vmmci_attach(struct device *parent, struct device *sel
> >             config_mountroot(self, vmmci_tick_hook);
> >     }
> > 
> > +   timeout_set(&sc->sc_to_shutdown, vmmci_worker, sc);
> > +
> >     printf("\n");
> >     if (virtio_attach_finish(vsc, va) != 0)
> >             goto err;
> > @@ -171,18 +177,22 @@ vmmci_config_change(struct virtio_softc *vsc)
> >             /* no action */
> >             break;
> >     case VMMCI_SHUTDOWN:
> > -           pvbus_shutdown(&sc->sc_dev);
> > -           break;
> >     case VMMCI_REBOOT:
> > -           pvbus_reboot(&sc->sc_dev);
> > +           /*
> > +            * Asynchronously process shutdown/reboot to allow us to handle
> > +            * when vmmci(4) receives a command during system startup or
> > +            * shutdown.
> > +            */
> > +           sc->sc_cmd_async = cmd;
> > +           kthread_create_deferred(vmmci_worker, sc);
> >             break;
> >     case VMMCI_SYNCRTC:
> >             inittodr(gettime());
> >             sc->sc_cmd = VMMCI_NONE;
> > -           break;
> > +           break;
> >     default:
> >             printf("%s: invalid command %d\n", sc->sc_dev.dv_xname, cmd);
> > -           cmd = VMMCI_NONE;
> > +           cmd = VMMCI_NONE;
> >             break;
> >     }
> > 
> > @@ -226,3 +236,25 @@ vmmci_tick_hook(struct device *self)
> >     timeout_set(&sc->sc_tick, vmmci_tick, sc);
> >     vmmci_tick(sc);
> >  }
> > +
> > +void
> > +vmmci_worker(void *arg)
> > +{
> > +   struct vmmci_softc      *sc = (struct vmmci_softc *)arg;
> > +
> > +   switch (sc->sc_cmd_async) {
> > +   case VMMCI_SHUTDOWN:
> > +           pvbus_shutdown(&sc->sc_dev);
> > +           break;
> > +   case VMMCI_REBOOT:
> > +           pvbus_reboot(&sc->sc_dev);
> > +           break;
> > +   default:
> > +           printf("%s: invalid async command %d\n", sc->sc_dev.dv_xname,
> > +               sc->sc_cmd_async);
> > +           return;
> > +   }
> > +
> > +   /* Re-arm in case initprocess misses our signal. */
> > +   timeout_add_sec(&sc->sc_to_shutdown, 5);
> > +}
> > 
> > 
> 


Reply via email to