Restore the fast invalidate flow for zapping shadow pages and use it whenever vCPUs can be active in the VM. This fixes (in theory, not yet confirmed) a regression reported by James Harvey where KVM can livelock in kvm_mmu_zap_all() when it's invoked in response to a memslot update.
The fast invalidate flow was removed as it was deemed to be unnecessary after its primary user, memslot flushing, was reworked to zap only the memslot in question instead of all shadow pages. Unfortunately, zapping only the memslot being (re)moved during a memslot update introduced a regression for VMs with assigned devices. Because we could not discern why zapping only the relevant memslot broke device assignment, or if the regression extended beyond device assignment, we reverted to zapping all shadow pages when a memslot is (re)moved. The revert to "zap all" failed to account for subsequent changes that have been made to kvm_mmu_zap_all() between then and now. Specifically, kvm_mmu_zap_all() now conditionally drops reschedules and drops mmu_lock if a reschedule is needed or if the lock is contended. Dropping the lock allows other vCPUs to add shadow pages, and, with enough vCPUs, can cause kvm_mmu_zap_all() to get stuck in an infinite loop as it can never zap all pages before observing lock contention or the need to reschedule. The reasoning behind having kvm_mmu_zap_all() conditionally reschedule was that it would only be used when the VM is inaccesible, e.g. when its mm_struct is dying or when the VM itself is being destroyed. In that case, playing nice with the rest of the kernel instead of hogging cycles to free unused shadow pages made sense. Since it's unlikely we'll root cause the device assignment regression any time soon, and that simply removing the conditional rescheduling isn't guaranteed to return us to a known good state, restore the fast invalidate flow for zapping on memslot updates, including mmio generation wraparound. Opportunisticaly tack on a bug fix and a couple enhancements. Alex and James, it probably goes without saying... please test, especially patch 01/11 as a standalone patch as that'll likely need to be applied to stable branches, assuming it works. Thanks! Sean Christopherson (11): KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot KVM: x86/mmu: Treat invalid shadow pages as obsolete KVM: x86/mmu: Use fast invalidate mechanism to zap MMIO sptes KVM: x86/mmu: Revert "Revert "KVM: MMU: show mmu_valid_gen in shadow page related tracepoints"" KVM: x86/mmu: Revert "Revert "KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages"" KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch"" KVM: x86/mmu: Revert "Revert "KVM: MMU: collapse TLB flushes when zap all pages"" KVM: x86/mmu: Revert "Revert "KVM: MMU: reclaim the zapped-obsolete page first"" KVM: x86/mmu: Revert "KVM: x86/mmu: Remove is_obsolete() call" KVM: x86/mmu: Explicitly track only a single invalid mmu generation KVM: x86/mmu: Skip invalid pages during zapping iff root_count is zero arch/x86/include/asm/kvm_host.h | 4 +- arch/x86/kvm/mmu.c | 154 ++++++++++++++++++++++++++++---- arch/x86/kvm/mmutrace.h | 42 +++++++-- arch/x86/kvm/x86.c | 1 + 4 files changed, 173 insertions(+), 28 deletions(-) -- 2.22.0