On Mon, 2026-05-04 at 17:30 -0700, Dongli Zhang wrote: > The KVM_CLOCK_REALTIME has been introduced to help track the downtime of > live migration. KVM uses that realtime value to advance guest clock, but > the same blackout is not reflected in KVM steal time. > > Account that same delta in steal time directly in kvm_vm_ioctl_set_clock(), > only when KVM_CLOCK_REALTIME is used. This keeps the KVM-only solution > self-contained and avoids adding a new KVM ioctl or requiring additional > userspace changes (i.e. QEMU). > > Record the per-VM downtime delta when KVM_SET_CLOCK receives > KVM_CLOCK_REALTIME, and fold it into the existing x86 steal accounting > path. Initialize each vCPU's local cursor > (vcpu->arch.st.last_downtime_steal) when the guest enables > MSR_KVM_STEAL_TIME so previously accumulated blackout is not charged. > > Note that this means a vCPU may observe additional steal time after > blackout even if the host side contribution from current->sched_info > did not increase during that interval. > > Signed-off-by: Dongli Zhang <[email protected]>
I really don't want to see KVM_CLOCK_REALTIME used for anything more than it already is. Or, indeed, even for that. There is precisely *one* place where it's OK to use 'real time' as a comparator, and that's when setting the guest's TSC. And even then it should be using TAI not UTC unless you like your guests' clocks jumping around by a second if you migrate at the wrong time. KVM_CLOCK_REALTIME was never the right thing to use, for anything. The KVM clock is a function of the guest's TSC (see KVM_SET_CLOCK_GUEST), and steal time is a function of that (as it's measured in nanoseconds). Don't bring UTC into it *anywhere*.
smime.p7s
Description: S/MIME cryptographic signature
