On 05/08/2025 01:09, Sean Christopherson wrote: > On Mon, Aug 04, 2025, Vitaly Kuznetsov wrote: >> Sean Christopherson <sea...@google.com> writes: >>> It'll take more work than the below, e.g. to have VMX's construct_eptp() >>> pull the >>> level and A/D bits from kvm_mmu_page (vendor code can get at the >>> kvm_mmu_page with >>> root_to_sp()), but for the core concept/skeleton, I think this is it? >>> >>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c >>> index 6e838cb6c9e1..298130445182 100644 >>> --- a/arch/x86/kvm/mmu/mmu.c >>> +++ b/arch/x86/kvm/mmu/mmu.c >>> @@ -3839,6 +3839,37 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, >>> struct kvm_mmu *mmu) >>> } >>> EXPORT_SYMBOL_GPL(kvm_mmu_free_guest_mode_roots); >>> >>> +struct kvm_tlb_flush_root { >>> + struct kvm *kvm; >>> + hpa_t root; >>> +}; >>> + >>> +static void kvm_flush_tlb_root(void *__data) >>> +{ >>> + struct kvm_tlb_flush_root *data = __data; >>> + >>> + kvm_x86_call(flush_tlb_root)(data->kvm, data->root); >>> +} >>> + >>> +void kvm_mmu_flush_all_tlbs_root(struct kvm *kvm, struct kvm_mmu_page >>> *root) >>> +{ >>> + struct kvm_tlb_flush_root data = { >>> + .kvm = kvm, >>> + .root = __pa(root->spt), >>> + }; >>> + >>> + /* >>> + * Flush any TLB entries for the new root, the provenance of the >>> root >>> + * is unknown. Even if KVM ensures there are no stale TLB entries >>> + * for a freed root, in theory another hypervisor could have left >>> + * stale entries. Flushing on alloc also allows KVM to skip the TLB >>> + * flush when freeing a root (see kvm_tdp_mmu_put_root()), and >>> flushing >>> + * TLBs on all CPUs allows KVM to elide TLB flushes when a vCPU is >>> + * migrated to a different pCPU. >>> + */ >>> + on_each_cpu(kvm_flush_tlb_root, &data, 1); >> >> Would it make sense to complement this with e.g. a CPU mask tracking all >> the pCPUs where the VM has ever been seen running (+ a flush when a new >> one is added to it)? >> >> I'm worried about the potential performance impact for a case when a >> huge host is running a lot of small VMs in 'partitioning' mode >> (i.e. when all vCPUs are pinned). Additionally, this may have a negative >> impact on RT use-cases where each unnecessary interruption can be seen >> problematic. > > Oof, right. And it's not even a VM-to-VM noisy neighbor problem, e.g. a few > vCPUs using nested TDP could generate a lot of noist IRQs through a VM. Hrm. > > So I think the basic idea is so flawed/garbage that even enhancing it with > per-VM > pCPU tracking wouldn't work. I do think you've got the right idea with a > pCPU mask > though, but instead of using a mask to scope IPIs, use it to elide TLB > flushes.
Sorry for the delay in replying, I've been sidetracked a bit. I like this idea more, not special casing the TLB flushing approach per hypervisor is preferable. > > With the TDP MMU, KVM can have at most 6 non-nested roots active at any given > time: > SMM vs. non-SMM, 4-level vs. 5-level, L1 vs. L2. Allocating a cpumask for > each > TDP MMU root seems reasonable. Then on task migration, instead of doing a > global > INVEPT, only INVEPT the current and prev_roots (because getting a new root > will > trigger a flush in kvm_mmu_load()), and skip INVEPT on TDP MMU roots if the > pCPU > has already done a flush for the root. Just to make sure I follow: current+prev_roots do you mean literally those (i.e. cached prev roots) or all roots on kvm->arch.tdp_mmu_roots? So this would mean: on pCPU migration, check if current mmu has is_tdp_mmu_active() and then perform the INVEPT-single over roots instead of INVEPT-global. Otherwise stick to the KVM_REQ_TLB_FLUSH. Would there need to be a check for is_guest_mode(), or that the switch is coming from the vmx/nested.c? I suppose not because nested doesn't seem to use TDP MMU. > > Or we could do the optimized tracking for all roots. x86 supports at most > 8192 > CPUs, which means 1KiB per root. That doesn't seem at all painful given that > each shadow pages consumes 4KiB... Similar question here: which all roots would need to be tracked+flushed for shadow paging? pae_roots?