On Fri, Aug 15, 2025, Jeremi Piotrowski wrote: > On Tue, Aug 05, 2025 at 04:42:46PM -0700, Sean Christopherson wrote: > I started working on extending patch 5, wanted to post it here to make sure > I'm > on the right track. > > It works in testing so far and shows promising performance - it gets rid of > all > the pathological cases I saw before.
Nice :-) > I haven't checked whether I broke SVM yet, and I need figure out a way to > always keep the cpumask "offstack" so that we don't blow up every struct > kvm_mmu_page instance with an inline cpumask - it needs to stay optional. Doh, I meant to include an idea or two for this in my earlier response. /The best I can come up with is > I also came across kvm_mmu_is_dummy_root(), that check is included in > root_to_sp(). Can you think of any other checks that we might need to handle? Don't think so? > @@ -3827,6 +3829,9 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, > gfn_t gfn, int quadrant, > sp = kvm_mmu_get_shadow_page(vcpu, gfn, role); > ++sp->root_count; > > + if (level >= PT64_ROOT_4LEVEL) Was this my code? If so, we should move this into the VMX code, because the fact that PAE roots can be ignored is really a detail of nested EPT, not the overall sceheme. > + kvm_x86_call(alloc_root_cpu_mask)(sp); Ah shoot. Allocating here won't work, because mmu_lock is held and allocating might sleep. I don't want to force an atomic allocation, because that can dip into pools that KVM really shouldn't use. The "standard" way KVM deals with this is to utilize a kvm_mmu_memory_cache. If we do that and add e.g kvm_vcpu_arch.mmu_roots_flushed_cache, then we trivially do the allocation in mmu_topup_memory_caches(). That would eliminate the error handling in vmx_alloc_root_cpu_mask(), and might make it slightly less awful to deal with the "offstack" cpumask. Hmm, and then instead of calling into VMX to do the allocation, maybe just have a flag to communicate that vendor code wants per-root flush tracking? I haven't thought hard about SVM, but I wouldn't be surprised if SVM ends up wanting the same functionality after we switch to per-vCPU ASIDs. > + > return __pa(sp->spt); > } ... > @@ -3307,22 +3309,34 @@ void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu) > vpid_sync_context(vmx_get_current_vpid(vcpu)); > } > > -static void __vmx_flush_ept_on_pcpu_migration(hpa_t root_hpa) > +void vmx_alloc_root_cpu_mask(struct kvm_mmu_page *root) > { This should be conditioned on enable_ept. > + WARN_ON_ONCE(!zalloc_cpumask_var(&root->cpu_flushed_mask, > + GFP_KERNEL_ACCOUNT)); > +} > + > +static void __vmx_flush_ept_on_pcpu_migration(hpa_t root_hpa, int cpu) > +{ > + struct kvm_mmu_page *root; > + > if (!VALID_PAGE(root_hpa)) > return; > > + root = root_to_sp(root_hpa); > + if (!root || cpumask_test_and_set_cpu(cpu, root->cpu_flushed_mask)) Hmm, this should flush if "root" is NULL, because the aforementioned "special" roots don't have a shadow page. But unless I'm missing an edge case (of an edge case), this particular code can WARN_ON_ONCE() since EPT should never need to use any of the special roots. We might need to filter out dummy roots somewhere to avoid false positives, but that should be easy enough. For the mask, it's probably worth splitting test_and_set into separate operations, as the common case will likely be that the root has been used on this pCPU. The test_and_set version will generate a LOCK BTS instruction, and so for the common case where the bit is already set, KVM will generate an atomic access, which can cause noise/bottlenecks E.g. if (WARN_ON_ONCE(!root)) goto flush; if (cpumask_test_cpu(cpu, root->cpu_flushed_mask)) return; cpumask_set_cpu(cpu, root->cpu_flushed_mask); flush: vmx_flush_tlb_ept_root(root_hpa); > + return; > + > vmx_flush_tlb_ept_root(root_hpa); > } > > -static void vmx_flush_ept_on_pcpu_migration(struct kvm_mmu *mmu) > +static void vmx_flush_ept_on_pcpu_migration(struct kvm_mmu *mmu, int cpu) > { > int i; > > - __vmx_flush_ept_on_pcpu_migration(mmu->root.hpa); > + __vmx_flush_ept_on_pcpu_migration(mmu->root.hpa, cpu); > > for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) > - __vmx_flush_ept_on_pcpu_migration(mmu->prev_roots[i].hpa); > + __vmx_flush_ept_on_pcpu_migration(mmu->prev_roots[i].hpa, cpu); > } > > void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu) > diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h > index b4596f651232..4406d53e6ebe 100644 > --- a/arch/x86/kvm/vmx/x86_ops.h > +++ b/arch/x86/kvm/vmx/x86_ops.h > @@ -84,6 +84,7 @@ void vmx_flush_tlb_all(struct kvm_vcpu *vcpu); > void vmx_flush_tlb_current(struct kvm_vcpu *vcpu); > void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr); > void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu); > +void vmx_alloc_root_cpu_mask(struct kvm_mmu_page *root); > void vmx_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask); > u32 vmx_get_interrupt_shadow(struct kvm_vcpu *vcpu); > void vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall); > -- > 2.39.5 >