On Tue, Oct 22, 2019 at 04:52:08PM -0700, Mike Larkin wrote: > On Tue, Oct 22, 2019 at 04:25:19PM -0700, [email protected] wrote: > > On Tue, 22 Oct 2019, Andreas Rottmann wrote: > > > >Synopsis: panic: pvclock0: unstable result on stable clock > > > >Category: virtualization > > > >Environment: > > > System : OpenBSD 6.6 > > > Details : OpenBSD 6.6 (GENERIC.MP) #372: Sat Oct 12 10:56:27 MDT > > > 2019 > > > > > > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > > > Architecture: OpenBSD.amd64 > > > Machine : amd64 > > > >Description: > > > > > > I've just experienced a kernel panic when resuming my laptop from > > > suspend-to-RAM while my OpenBSD 6.6 VM was running; the first few lines > > > of the crash read like this: > > > > > > panic: pvclock0: unstable result on stable clock > > > Stopped at db_enter+0x10: popq %rbp > > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > > db_enter() at db_enter+0x10 > > > panic() at panic+0x128 > > > pvclock_get_timecount(ffffffff81f14360) at pvclock_get_timecount+0xc2 > > > > > > The full ddb session, including backtraces for both cores, and the `ps` > > > output is attached as `ddb.txt`. > > > > So the immediate code of the panic is this: > > /* This bit must be set as we attached based on the stable flag */ > > if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) > > panic("%s: unstable result on stable clock", DEVNAME(sc)); > > > > That is, the pvclock driver currently assumes that if it advertises a > > stable clock when the OpenBSD guest is booted, then it'll remain stable > > forever. That apparently is not a safe assumption across a suspend/resume > > cycle in the Linux/KVM host. > > > > It probably also isn't a safe assumption in a live migration scenario, > either, if you're correct above. > > -ml > > > To fix this, the driver would have to get the system to stop using it as > > the active timecounter whenever its marked instable. Perhaps it could > > just adjust its quality (sc->sc_tc->tc_quality) downward while that's the > > case? I'm not sure if that would be enough, but you could try > > implementing that. > > > > Lacking that, I guess you'll want to have KVM stop the guest before you > > suspend the host, and then on resume wait a bit until the clock > > settles--not sure how long that takes or how you would know--before > > restarting the guest. > > > > > > Philip Guenther > > > For the archives, the following commit fixed this panic. I can successfully suspend and resume a linux host running OpenBSD 6.6 snapshots under QEMU/KVM:
RCS file: /cvs/src/sys/dev/pv/pvclock.c,v Working file: dev/pv/pvclock.c head: 1.5 branch: locks: strict access list: keyword substitution: kv total revisions: 5; selected revisions: 5 description: ---------------------------- revision 1.5 date: 2019/12/13 06:43:46; author: pd; state: Exp; lines: +20 -12; commitid: mpL92Q7XX7jEvgkn; pvclock(4): attach even if when PVCLOCK_FLAG_TSC_STABLE is unset Attaches pvclock with lower priority (500) in case of unstable tsc (PVCLOCK_FLAG_TSC_STABLE) instead of not attaching at all. In this state, we do make sure to return a monotonically increasing number. This mostly helps openbsd guests on openbsd vmm(4) where a pvclock with unstable tsc is still better than i8254. ok mlarkin@
