On Wed, May 20, 2026, Fuad Tabba wrote:
> On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay
> <[email protected]> wrote:
> >
> > From: Ackerley Tng <[email protected]>
> >
> > When the maximum mapping level is queried, KVM's MMU lock is held, and
> > while the MMU lock is held, guest_memfd cannot take the
> > filemap_invalidate_lock() to look up the current shared/private state of
> > the gfn, for these reasons:
> >
> > + The MMU lock is a spinlock or rwlock and cannot be held while taking a
> >   lock that can sleep.
> > + In guest_memfd's code paths (such as truncate), the
> >   filemap_invalidate_lock() is held while taking the MMU lock, and taking
> >   the locks in reverse order would introduce a AB-BA deadlock.
> >
> > Currently, the maximum mapping level is only queried from guest_memfd in
> > the process of recovering huge pages, if dirty logging is disabled on a
> > memslot. Dirty logging is not currently supported for guest_memfd, and
> > guest_memfd memslots also cannot be updated.
> >
> > For now, bug the VM if guest_memfd needs to be queried to determine the
> > maximum mapping level. This guard can be removed if/when support is added.
> >
> > Signed-off-by: Ackerley Tng <[email protected]>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index a80a876ab4ad6..153bcc5369985 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -3357,6 +3357,15 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, 
> > struct kvm_page_fault *fault,
> >                 max_level = fault->max_level;
> >                 is_private = fault->is_private;
> >         } else {
> > +               /*
> > +                * Memory attributes cannot be obtained from guest_memfd 
> > while
> > +                * the MMU lock is held.
> > +                */
> > +               if 
> > (KVM_BUG_ON(static_call_query(__kvm_get_memory_attributes) ==
> > +                              kvm_gmem_get_memory_attributes, kvm)) {
> > +                       return 0;
> > +               }
> > +
> 
> This directly takes the address of kvm_gmem_get_memory_attributes,
> which is only compiled if CONFIG_KVM_GUEST_MEMFD=y. This breaks
> ARCH=i386.

And this bleeds guest_memfd implementation details into places they don't 
belong.
The right way to deal with this is to use lockdep_assert_not_held() in whatever
code mustn't run with mmu_lock held.  E.g.

diff --git virt/kvm/guest_memfd.c virt/kvm/guest_memfd.c
index c9f155c2dc5c..3bea9c1137ef 100644
--- virt/kvm/guest_memfd.c
+++ virt/kvm/guest_memfd.c
@@ -547,6 +547,9 @@ unsigned long kvm_gmem_get_memory_attributes(struct kvm 
*kvm, gfn_t gfn)
        struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
        struct inode *inode;
 
+       /* Comment goes here. */
+       lockdep_assert_not_held(&kvm->mmu_lock);
+
        /*
         * If this gfn has no associated memslot, there's no chance of the gfn
         * being backed by private memory, since guest_memfd must be used for

But I'm confused, because kvm_gmem_get_memory_attributes() doesn't actually take
filemap_invalidate_lock(), so what exactly is the problem?

> >                 max_level = PG_LEVEL_NUM;
> >                 is_private = kvm_mem_is_private(kvm, gfn);
> >         }
> >
> > --
> > 2.54.0.563.g4f69b47b94-goog
> >
> >

Reply via email to