On Tue, Oct 22, 2019 at 04:52:08PM -0700, Mike Larkin wrote:
> On Tue, Oct 22, 2019 at 04:25:19PM -0700, [email protected] wrote:
> > On Tue, 22 Oct 2019, Andreas Rottmann wrote:
> > > >Synopsis:        panic: pvclock0: unstable result on stable clock
> > > >Category:        virtualization
> > > >Environment:
> > >   System      : OpenBSD 6.6
> > >   Details     : OpenBSD 6.6 (GENERIC.MP) #372: Sat Oct 12 10:56:27 MDT 
> > > 2019
> > >                    
> > > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > 
> > >   Architecture: OpenBSD.amd64
> > >   Machine     : amd64
> > > >Description:
> > > 
> > > I've just experienced a kernel panic when resuming my laptop from 
> > > suspend-to-RAM while my OpenBSD 6.6 VM was running; the first few lines 
> > > of the crash read like this:
> > > 
> > > panic: pvclock0: unstable result on stable clock
> > > Stopped at      db_enter+0x10:  popq    %rbp
> > >     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > > db_enter() at db_enter+0x10
> > > panic() at panic+0x128
> > > pvclock_get_timecount(ffffffff81f14360) at pvclock_get_timecount+0xc2
> > > 
> > > The full ddb session, including backtraces for both cores, and the `ps`
> > > output is attached as `ddb.txt`.
> > 
> > So the immediate code of the panic is this:
> >         /* This bit must be set as we attached based on the stable flag */
> >         if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0)
> >                 panic("%s: unstable result on stable clock", DEVNAME(sc));
> > 
> > That is, the pvclock driver currently assumes that if it advertises a 
> > stable clock when the OpenBSD guest is booted, then it'll remain stable 
> > forever.  That apparently is not a safe assumption across a suspend/resume 
> > cycle in the Linux/KVM host.
> > 
> 
> It probably also isn't a safe assumption in a live migration scenario,
> either, if you're correct above.
> 
> -ml
> 
> > To fix this, the driver would have to get the system to stop using it as 
> > the active timecounter whenever its marked instable.  Perhaps it could 
> > just adjust its quality (sc->sc_tc->tc_quality) downward while that's the 
> > case?  I'm not sure if that would be enough, but you could try 
> > implementing that.
> > 
> > Lacking that, I guess you'll want to have KVM stop the guest before you 
> > suspend the host, and then on resume wait a bit until the clock 
> > settles--not sure how long that takes or how you would know--before 
> > restarting the guest.
> > 
> > 
> > Philip Guenther
> > 
> 
For the archives, the following commit fixed this panic.
I can successfully suspend and resume a linux host running OpenBSD 6.6
snapshots under QEMU/KVM:

RCS file: /cvs/src/sys/dev/pv/pvclock.c,v
Working file: dev/pv/pvclock.c
head: 1.5
branch:
locks: strict
access list:
keyword substitution: kv
total revisions: 5;     selected revisions: 5
description:
----------------------------
revision 1.5
date: 2019/12/13 06:43:46;  author: pd;  state: Exp;  lines: +20 -12;  
commitid: mpL92Q7XX7jEvgkn;
pvclock(4): attach even if when PVCLOCK_FLAG_TSC_STABLE is unset

Attaches pvclock with lower priority (500) in case of unstable tsc
(PVCLOCK_FLAG_TSC_STABLE) instead of not attaching at all.  In this state, we do
make sure to return a monotonically increasing number.

This mostly helps openbsd guests on openbsd vmm(4) where a pvclock with unstable
tsc is still better than i8254.

ok mlarkin@

Reply via email to