This is v5 of the series to clean up the KVM clock, rebased onto
tip/timers/ptp (which now includes Thomas's ktime snapshot series and
the read_snapshot patches for hyperv, kvmclock, and vmclock).
The KVM clock has historically suffered from three problems:
1. Imprecision: get_kvmclock_ns() computed the clock from the *host*
TSC without applying guest TSC scaling, causing systemic drift from
the values the guest computes from its own TSC.
2. Unnecessary discontinuities: gratuitous KVM_REQ_MASTERCLOCK_UPDATE
requests caused the master clock reference point to be re-snapshotted,
yanking the guest's clock due to arithmetic precision differences.
3. No precise migration API: the existing KVM_[GS]ET_CLOCK only allows
setting the clock at a given UTC reference time, which is necessarily
imprecise. There was no way to preserve the exact arithmetic
relationship between guest TSC and KVM clock across live migration.
This series addresses all three, and adds new APIs for precise clock
migration and TSC frequency reporting. As an added bonus, it now rips
out the whole pvclock_gtod_data hack which was shadowing the kernel's
timekeeping, and uses ktime snapshots as $DEITY (well, Thomas) intended.
Changes since v4:
- Rebased onto tip/timers/ptp (includes ktime snapshot infrastructure)
- Dropped "WARN if kvm_get_walltime_and_clockread() fails" — the WARN
was spurious during clocksource transitions
- Dropped guest-side "Obtain TSC frequency from CPUID" patches (adopted
by Sean for a separate series)
- Dropped KVM_VCPU_TSC_EFFECTIVE_FREQ
- Fixed false re-enabling of master clock when a single vCPU syncs
multiple times at a mismatched frequency: introduced per-vCPU
cur_tsc_freq_generation counter so each vCPU is counted exactly once
- Unified nr_vcpus_matched_tsc and nr_vcpus_matched_freq to use the
same counting convention (1-based, >= online_vcpus threshold)
- "Avoid gratuitous global clock updates": kept global update in
non-master-clock mode on vCPU load (CLOCK_MONOTONIC_RAW means no NTP
drift but preserving the existing safety); only optimize master clock
- "Xen runstate negative time": refined to update state but not account
time on backwards clock, always update last_steal and guest shared page
- Added "Activate master clock immediately on vCPU creation" to avoid
unnecessary non-master-clock window during VM setup
- New final patches: use ktime_get_snapshot_id() for master clock
reference, then remove pvclock_gtod_data entirely (replaced by direct
ktime_get_raw() + offs_boot computation)
- Added masterclock_offset_test selftest (verifies kvmclock consistency
across vCPUs with different TSC offsets)
- Added xen_cpuid_timing_test selftest
- Added pvclock_migration_test selftest
- Addressed AI reviewer (Sashiko) feedback throughout:
- get_kvmclock(): goto fallback on clock read failure instead of
using uninitialized data; single #ifdef CONFIG_X86_64 block
- kvm_synchronize_tsc(): changed ns to s64 to match function
signature; moved time reads inside tsc_write_lock
- Kill last_tsc fields: use kvm_scale_tsc() subtraction for
backwards TSC instead of zeroing cur_tsc_write
- KVM_[GS]ET_CLOCK_GUEST: validate padding fields, bounds-check
tsc_shift
- pvclock selftest: seqcount loop for torn-read safety, per-vCPU
pvclock addresses, graceful skip when caps unavailable
- KVM_VCPU_TSC_SCALE: return -ENXIO when !has_tsc_control
- UAPI pvclock-abi: added -D__KERNEL__ to xen-hypercalls.sh
- VMX: also clear SECONDARY_EXEC_TSC_SCALING from vmcs_config
David Woodhouse (31):
KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init()
KVM: x86: Improve accuracy of KVM clock when TSC scaling is in force
KVM: x86: Explicitly disable TSC scaling without CONSTANT_TSC
KVM: x86: Activate master clock immediately on vCPU creation
KVM: x86: Add KVM_VCPU_TSC_SCALE and fix the documentation on TSC migration
KVM: x86: Avoid NTP frequency skew for KVM clock on 32-bit host
KVM: x86: Fold __get_kvmclock() into get_kvmclock()
KVM: x86: Restructure get_kvmclock()
KVM: x86: Fix KVM clock precision in get_kvmclock() with TSC scaling
KVM: x86: Use get_kvmclock() in kvm_get_wall_clock_epoch()
KVM: x86: Fix compute_guest_tsc() to handle negative time deltas
KVM: x86: Restructure kvm_guest_time_update() for TSC upscaling
KVM: x86: Simplify and comment kvm_get_time_scale()
KVM: x86: Remove implicit rdtsc() from kvm_compute_l1_tsc_offset()
KVM: x86: Improve synchronization in kvm_synchronize_tsc()
KVM: x86: Kill last_tsc_{nsec,write,offset} fields
KVM: x86: Replace nr_vcpus_matched_tsc count with all_vcpus_matched_tsc bool
KVM: x86: Allow KVM master clock mode when TSCs are offset from each other
KVM: x86: Factor out kvm_use_master_clock()
KVM: x86: Avoid gratuitous global clock updates
KVM: x86/xen: Prevent runstate times from becoming negative
KVM: x86: Avoid redundant masterclock updates from multiple vCPUs
KVM: x86: Remove runtime Xen TSC frequency CPUID update
KVM: x86: Re-synchronize TSC after KVM_SET_TSC_KHZ
KVM: x86: Use ktime_get_snapshot_id() for master clock
KVM: x86: Compute kvmclock base without pvclock_gtod_data
KVM: x86: Replace pvclock_gtod_data vclock_mode with boolean
KVM: x86: Remove pvclock_gtod_data and private timekeeping code
KVM: selftests: Add master clock offset test
KVM: selftests: Add Xen/generic CPUID timing leaf test
KVM: selftests: Add Xen runstate migration test
Jack Allister (3):
UAPI: x86: Move pvclock-abi to UAPI for x86 platforms
KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration
KVM: selftests: Add KVM/PV clock selftest to prove timer correction
Documentation/virt/kvm/api.rst | 37 +
Documentation/virt/kvm/devices/vcpu.rst | 119 ++-
MAINTAINERS | 4 +-
arch/x86/include/asm/kvm_host.h | 16 +-
arch/x86/include/uapi/asm/kvm.h | 6 +
arch/x86/include/{ => uapi}/asm/pvclock-abi.h | 27 +-
arch/x86/kvm/cpuid.c | 16 -
arch/x86/kvm/svm/svm.c | 3 +-
arch/x86/kvm/vmx/vmx.c | 4 +-
arch/x86/kvm/x86.c | 1039 ++++++++++++--------
arch/x86/kvm/xen.c | 30 +-
arch/x86/kvm/xen.h | 13 -
include/uapi/linux/kvm.h | 3 +
scripts/xen-hypercalls.sh | 2 +-
tools/testing/selftests/kvm/Makefile.kvm | 4 +
.../selftests/kvm/x86/masterclock_offset_test.c | 180 ++++
.../selftests/kvm/x86/pvclock_migration_test.c | 382 +++++++
tools/testing/selftests/kvm/x86/pvclock_test.c | 441 +++++++++
.../selftests/kvm/x86/xen_cpuid_timing_test.c | 230 +++++
.../testing/selftests/kvm/x86/xen_migration_test.c | 194 ++++
20 files changed, 2263 insertions(+), 487 deletions(-)
base-commit: bc484a5096732cd858771cccd3164ec985bdc03d