On 5/10/26 11:40 AM, David Woodhouse wrote:
> On Sun, 2026-05-10 at 18:09 +0100, David Woodhouse wrote:
>> On Fri, 2026-05-08 at 15:40 -0700, Sean Christopherson wrote:
>>> On Mon, May 04, 2026, Dongli Zhang wrote:
>>>> KVM does not support vCPU hotplug. When a vCPU is removed, its
>>>> corresponding data structures are not freed by KVM. Instead, QEMU destroys
>>>> only the userspace state and the vCPU thread, while the KVM vCPU fd remains
>>>> open and parked in QEMU.
>>>>
>>>> As a result, vcpu->arch.st.last_steal is not reset.
>>>>
>>>> If the same vCPU is later re-created by QEMU, last_steal retains its old
>>>> value, while current->sched_info.run_delay starts from zero since a new
>>>> vCPU thread is created. This causes
>>>> current->sched_info.run_delay - vcpu->arch.st.last_steal to produce a
>>>> large, bogus value.
>>>>
>>>> Fix this by resetting vcpu->arch.st.last_steal to
>>>> current->sched_info.run_delay when KVM steal time is enabled.
>>>
>>> This is quite arbitrary.  E.g. if userspace hands the vCPU off to a 
>>> different
>>> task without going through QEMU's hotplug dance, then 
>>> current->sched_info.run_delay
>>> will also change.
>>>
>>> Shouldn't x86 hook kvm_arch_vcpu_run_pid_change() and reset last_steal in 
>>> there?
>>
>> I'd like to be sure that we get this right for live update and live 
>> migration.
>>
>> I think we *do* get it right for the Xen runstate info...
> 
> Since I'm adding selftests to my kvmclock branch today... I now *know*
> this to be true :)
> 
> 
> https://git.infradead.org/?p=users/dwmw2/linux.git;a=commitdiff;h=d667349116
> 
> Looks like Sean is right about the pid change though.

Thank you very much for reminder! Base on my understanding of source code, the
delta between vcpu->arch.st.last_steal and current->sched_info.run_delay (new
thread) won't account downtime if we handle it properly in
kvm_arch_vcpu_run_pid_change(). I may also add that validation to the selftest.

Thank you very much!

Dongli Zhang

Reply via email to