On Wed, May 20, 2026, Fuad Tabba wrote:
> On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay
> <[email protected]> wrote:
> >
> > From: Ackerley Tng <[email protected]>
> >
> > When the maximum mapping level is queried, KVM's MMU lock is held, and
> > while the MMU lock is held, guest_memfd cannot take the
> > filemap_invalidate_lock() to look up the current shared/private state of
> > the gfn, for these reasons:
> >
> > + The MMU lock is a spinlock or rwlock and cannot be held while taking a
> > lock that can sleep.
> > + In guest_memfd's code paths (such as truncate), the
> > filemap_invalidate_lock() is held while taking the MMU lock, and taking
> > the locks in reverse order would introduce a AB-BA deadlock.
> >
> > Currently, the maximum mapping level is only queried from guest_memfd in
> > the process of recovering huge pages, if dirty logging is disabled on a
> > memslot. Dirty logging is not currently supported for guest_memfd, and
> > guest_memfd memslots also cannot be updated.
> >
> > For now, bug the VM if guest_memfd needs to be queried to determine the
> > maximum mapping level. This guard can be removed if/when support is added.
> >
> > Signed-off-by: Ackerley Tng <[email protected]>
> > ---
> > arch/x86/kvm/mmu/mmu.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index a80a876ab4ad6..153bcc5369985 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -3357,6 +3357,15 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
> > struct kvm_page_fault *fault,
> > max_level = fault->max_level;
> > is_private = fault->is_private;
> > } else {
> > + /*
> > + * Memory attributes cannot be obtained from guest_memfd
> > while
> > + * the MMU lock is held.
> > + */
> > + if
> > (KVM_BUG_ON(static_call_query(__kvm_get_memory_attributes) ==
> > + kvm_gmem_get_memory_attributes, kvm)) {
> > + return 0;
> > + }
> > +
>
> This directly takes the address of kvm_gmem_get_memory_attributes,
> which is only compiled if CONFIG_KVM_GUEST_MEMFD=y. This breaks
> ARCH=i386.
And this bleeds guest_memfd implementation details into places they don't
belong.
The right way to deal with this is to use lockdep_assert_not_held() in whatever
code mustn't run with mmu_lock held. E.g.
diff --git virt/kvm/guest_memfd.c virt/kvm/guest_memfd.c
index c9f155c2dc5c..3bea9c1137ef 100644
--- virt/kvm/guest_memfd.c
+++ virt/kvm/guest_memfd.c
@@ -547,6 +547,9 @@ unsigned long kvm_gmem_get_memory_attributes(struct kvm
*kvm, gfn_t gfn)
struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
struct inode *inode;
+ /* Comment goes here. */
+ lockdep_assert_not_held(&kvm->mmu_lock);
+
/*
* If this gfn has no associated memslot, there's no chance of the gfn
* being backed by private memory, since guest_memfd must be used for
But I'm confused, because kvm_gmem_get_memory_attributes() doesn't actually take
filemap_invalidate_lock(), so what exactly is the problem?
> > max_level = PG_LEVEL_NUM;
> > is_private = kvm_mem_is_private(kvm, gfn);
> > }
> >
> > --
> > 2.54.0.563.g4f69b47b94-goog
> >
> >