kvmclock: Use TSC for sched_clock if it's constant and non-stop

Sean Christopherson Mon, 18 May 2026 13:34:37 -0700

On Fri, May 15, 2026, [email protected] wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) 
> to consider:
> - [High] Bypassing `kvm_sched_clock_init()` when `CONSTANT_TSC` is present
> ignores KVM's `PVCLOCK_TSC_STABLE_BIT`, leading to unfiltered backward time
> jumps in the core scheduler.  --
> 
> commit 57cf68835ef70733c037ca9ffa04fd39e1d70027
> Author: Sean Christopherson <[email protected]>
> 
> x86/kvmclock: Use TSC for sched_clock if it's constant and non-stop
> 
> This commit updates kvmclock_init to prefer the TSC over kvmclock for
> sched_clock when the TSC is constant, non-stop, and not marked unstable.
> It does this by bypassing kvm_sched_clock_init and instead modifying the
> TSC properties so that TSC becomes the preferred clocksource.
> 
> Link: https://lore.kernel.org/all/[email protected]
> 
> > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> > --- a/arch/x86/kernel/kvmclock.c
> > +++ b/arch/x86/kernel/kvmclock.c
> [ ... ]
> > @@ -416,22 +416,22 @@ void __init kvmclock_init(void)
> [ ... ]
> >     if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> >         boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> >         !check_tsc_unstable()) {
> >             kvm_clock.rating = 299;
> >             tsc_properties = TSC_FREQ_KNOWN_AND_RELIABLE;
> > +   } else {
> > +           kvm_sched_clock_init(stable);
> >     }
> >  
> > -   kvm_sched_clock_init(stable);
> 
> Does bypassing kvm_sched_clock_init() when CONSTANT_TSC is present
> accidentally ignore KVM's PVCLOCK_TSC_STABLE_BIT?
> 
> CONSTANT_TSC indicates the hardware TSC rate doesn't change with power
> states, but it doesn't guarantee cross-CPU synchronization, such as across
> multi-socket systems or differing vCPU offsets from hotplug and migration.
> KVM explicitly signals this cross-CPU stability via PVCLOCK_TSC_STABLE_BIT,
> which is mapped to the stable variable.
> 
> Prior to this patch, kvm_sched_clock_init() was always called, invoking
> __paravirt_set_sched_clock() with the stable flag. This allowed the system
> to call clear_sched_clock_stable() if the hypervisor could not guarantee
> synchronization, enabling crucial per-CPU monotonicity filters in
> sched_clock_cpu().
> 
> By skipping kvm_sched_clock_init() entirely and registering
> TSC_FREQ_KNOWN_AND_RELIABLE, the kernel's native cross-CPU TSC synchronization
> checks and the clocksource watchdog appear to be bypassed.


This is very much intented behavior.

> If KVM clears PVCLOCK_TSC_STABLE_BIT, could this result in the guest
> ignoring the condition and using raw, unfiltered rdtsc() values?

Yes, that's the entire point.
 
Note, PVCLOCK_TSC_STABLE_BIT is still set in 
pvclock_read_flags(&hv_clock_boot[0].pvti),
so anything else that directly consumes PVCLOCK_TSC_STABLE_BIT still does the
right thing.  E.g. kvm_setup_vsyscall_timeinfo() will still set kvm_clock's vDSO
mode to VDSO_CLOCKMODE_PVCLOCK, to communicate that kvm_lock can be used for the
vDSO page, *if* it's chose.

> When tasks migrate between vCPUs with differing TSC offsets, this regression
> might cause backward time jumps in sched_clock(), which breaks rq_clock
> monotonicity and corrupts CFS vruntime and min_vruntime calculations.

Only if the host messed up and incorrectly advertised CONSTANT+NONSTOP.

Re: [PATCH v3 37/41] x86/kvmclock: Use TSC for sched_clock if it's constant and non-stop

Reply via email to