On Tue, 22 Oct 2019, Andreas Rottmann wrote:
> >Synopsis: panic: pvclock0: unstable result on stable clock
> >Category: virtualization
> >Environment:
> System : OpenBSD 6.6
> Details : OpenBSD 6.6 (GENERIC.MP) #372: Sat Oct 12 10:56:27 MDT
> 2019
>
> [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
> Architecture: OpenBSD.amd64
> Machine : amd64
> >Description:
>
> I've just experienced a kernel panic when resuming my laptop from
> suspend-to-RAM while my OpenBSD 6.6 VM was running; the first few lines
> of the crash read like this:
>
> panic: pvclock0: unstable result on stable clock
> Stopped at db_enter+0x10: popq %rbp
> TID PID UID PRFLAGS PFLAGS CPU COMMAND
> db_enter() at db_enter+0x10
> panic() at panic+0x128
> pvclock_get_timecount(ffffffff81f14360) at pvclock_get_timecount+0xc2
>
> The full ddb session, including backtraces for both cores, and the `ps`
> output is attached as `ddb.txt`.
So the immediate code of the panic is this:
/* This bit must be set as we attached based on the stable flag */
if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0)
panic("%s: unstable result on stable clock", DEVNAME(sc));
That is, the pvclock driver currently assumes that if it advertises a
stable clock when the OpenBSD guest is booted, then it'll remain stable
forever. That apparently is not a safe assumption across a suspend/resume
cycle in the Linux/KVM host.
To fix this, the driver would have to get the system to stop using it as
the active timecounter whenever its marked instable. Perhaps it could
just adjust its quality (sc->sc_tc->tc_quality) downward while that's the
case? I'm not sure if that would be enough, but you could try
implementing that.
Lacking that, I guess you'll want to have KVM stop the guest before you
suspend the host, and then on resume wait a bit until the clock
settles--not sure how long that takes or how you would know--before
restarting the guest.
Philip Guenther