On 2026/2/27 02:24, Sean Christopherson wrote:
On Thu, Feb 26, 2026, Lance Yang wrote:
On 2026/2/26 04:11, Sean Christopherson wrote:
On Mon, Feb 02, 2026, Lance Yang wrote:
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 37dc8465e0f5..6a5e47ee4eb6 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -856,6 +856,12 @@ static void __init kvm_guest_init(void)
   #ifdef CONFIG_SMP
        if (pv_tlb_flush_supported()) {
                pv_ops.mmu.flush_tlb_multi = kvm_flush_tlb_multi;
+               /*
+                * KVM's flush implementation calls native_flush_tlb_multi(),
+                * which sends real IPIs when INVLPGB is not available.

Not on all (virtual) CPUs.  The entire point of KVM's PV TLB flush is to elide
the IPIs.  If a vCPU was scheduled out by the host, the guest sets a flag and
relies on the host to flush the TLB on behalf of the guest prior to the next
VM-Enter.

Ah, I see. Thanks for the correction!

KVM only sends IPIs to running vCPUs; preempted ones are left out of the mask
and flushed on VM-Enter. So the old comment was wrong ...

IIUC, we still set the flag to true because only running vCPUs can be in a
software/lockless walk, and they all get the IPI, so the flush is enough.

Does that match what you had in mind?

No, because from the guest kernel's perspective, the vCPU is running.  The 
kernel
can't make any assumptions about what code the vCPU was executing when the vCPU
was preempted by the host scheduler, i.e. it's entirely possible the vCPU is in
a software/lockless walk.

Thanks a lot for setting me straight!

So any PV that has its own things and doesn't call
native_flush_tlb_multi() directly cannot be trusted to provide the IPI
guarantees we need.

So we should only set the flag for the native path, which truly calls
native_flush_tlb_multi() directly.


Have a great weekend,
Lance

Reply via email to