On Fri, Jun 19, 2026, Fuad Tabba wrote:
> On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
> <[email protected]> wrote:
> >
> > From: Ackerley Tng <[email protected]>
> >
> > When memory in guest_memfd is converted from private to shared, the
> > platform-specific state associated with the guest-private pages must be
> > invalidated or cleaned up.
> >
> > Iterate over the folios in the affected range and call the
> > kvm_arch_gmem_invalidate() hook for each PFN range. This allows
> > architectures to perform necessary teardown, such as updating hardware
> > metadata or encryption states, before the pages are transitioned to the
> > shared state.
> >
> > Invoke this helper after indicating to KVM's mmu code that an invalidation
> > is in progress to stop in-flight page faults from succeeding.
> >
> > Reviewed-by: Fuad Tabba <[email protected]>
> > Signed-off-by: Ackerley Tng <[email protected]>
>
> Coming back to this after working through the arm64/pKVM side. My
> Reviewed-by here is from the previous round and the patch hasn't
> changed, but I missed an implication for arm64.
>
> kvm_arch_gmem_invalidate() is now called from two paths with the same
> (start, end) signature: folio teardown (kvm_gmem_free_folio) and
> private->shared conversion (here). For SNP/TDX that's fine, conversion is
> destructive anyway. For pKVM the two need opposite content semantics:
> conversion must preserve the page in place (same physical page, the point
> of in-place conversion without encryption), while teardown must scrub it
> before returning it to the host.
>
> The hook gets only a pfn range with no indication of which caller it's
> serving, so arm64 can't give the two paths the behaviour they need. It
> would help to signal intent on the conversion path: a reason/flag, a
> separate hook, or not routing non-destructive conversion through the
> teardown hook.
>
> arm64 isn't here yet, so this isn't urgent, but the hook is gaining a
> second caller now, and it's cheaper to leave room for the distinction
> than to change a generic contract other arches depend on later.
Crud. It may not be urgent for arm64, but it's urgent for other reasons that
I "can't" describe in detail at the moment, and even if that weren't the case, I
think we should clean things up now. More below.
> > virt/kvm/guest_memfd.c | 41 +++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 41 insertions(+)
> >
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index 433f79047b9d1..3c94442bc8131 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -607,6 +607,42 @@ static bool kvm_gmem_is_safe_for_conversion(struct
> > inode *inode, pgoff_t start,
> > return safe;
> > }
> >
> > +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
> > +static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start,
> > pgoff_t end)
Not your fault, but kvm_arch_gmem_invalidate() is badly misnamed. It's not
"invalidating" anything, it's much more of a "free" callback, as SNP uses it to
put physical pages back into a shared state when a maybe-private folio is freed.
As Fuad points out, (ab)using that hook for the private=>shared conversion case
"works", but not broadly. And it makes the bad name worse, because it's called
from code that _is_ doing true invalidations. For pKVM, it may not even need to
do anything invalidation-like.
To avoid a conflict with patches that are going to have priority over this
series,
to set the stage for arm64 support, and to avoid avoid bleeding vendor details
into guest_memfd, as if they are core guest_memfd behavior (only SNP needs the
"invalidation" on this specific transition), I think we should add an arch hook
to do conversions straightaway.
Unless there's a clever option I'm missing, it'll mean adding yet another
HAVE_KVM_ARCH_GMEM_XXX flag? Hmm, especially because IIUC, arm64/pKVM doesn't
need a callback for this case, only the free_folio case.
> > +{
> > + struct folio_batch fbatch;
> > + pgoff_t next = start;
> > + int i;
> > +
> > + folio_batch_init(&fbatch);
> > + while (filemap_get_folios(inode->i_mapping, &next, end - 1,
> > &fbatch)) {
> > + for (i = 0; i < folio_batch_count(&fbatch); ++i) {
> > + struct folio *folio = fbatch.folios[i];
> > + pgoff_t start_index, end_index;
> > + kvm_pfn_t start_pfn, end_pfn;
> > +
> > + start_index = max(start, folio->index);
> > + end_index = min(end, folio_next_index(folio));
> > + /*
> > + * end_index is either in folio or points to
> > + * the first page of the next folio. Hence,
> > + * all pages in range [start_index, end_index)
> > + * are contiguous.
> > + */
> > + start_pfn = folio_file_pfn(folio, start_index);
> > + end_pfn = start_pfn + end_index - start_index;
> > +
> > + kvm_arch_gmem_invalidate(start_pfn, end_pfn);
> > + }
> > +
> > + folio_batch_release(&fbatch);
> > + cond_resched();
> > + }
> > +}
> > +#else
> > +static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start,
> > pgoff_t end) {}
> > +#endif
> > +
> > static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
> > size_t nr_pages, uint64_t attrs,
> > pgoff_t *err_index)
> > @@ -647,7 +683,12 @@ static int __kvm_gmem_set_attributes(struct inode
> > *inode, pgoff_t start,
> > */
> >
> > kvm_gmem_invalidate_start(inode, start, end);
> > +
> > + if (!to_private)
> > + kvm_gmem_invalidate(inode, start, end);
E.g. instead make this something like this?
kvm_gmem_set_pfn_attributes(...)
Hrm, though that wastes folio lookups in the to_private case. So maybe just
this,
assuming pKVM doesn't need to take additional action on conversions?
if (!to_private)
kvm_gmem_make_shared(...)
Actually, if we do that, then we don't need a separate arch hook, just a
separate
config. It'll still bleed SNP details into guest_memfd, but it'll at least be
done in a way that's more explicitly arch specific (and it's no different than
what we already do for PREPARE...).
E.g. this? There will still be a looming rename conflict, but that's easy
enough
to handle.
diff --git virt/kvm/guest_memfd.c virt/kvm/guest_memfd.c
index 9ce5be7843f2..8aead0abd788 100644
--- virt/kvm/guest_memfd.c
+++ virt/kvm/guest_memfd.c
@@ -648,8 +648,8 @@ static bool kvm_gmem_is_safe_for_conversion(struct inode
*inode, pgoff_t start,
return safe;
}
-#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
-static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t
end)
+#ifdef CONFIG_KVM_ARCH_GMEM_FREE_ON_SHARED_CONVERSION
+static void kvm_gmem_make_shared(struct inode *inode, pgoff_t start, pgoff_t
end)
{
struct folio_batch fbatch;
pgoff_t next = start;
@@ -681,7 +681,7 @@ static void kvm_gmem_invalidate(struct inode *inode,
pgoff_t start, pgoff_t end)
}
}
#else
-static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t
end) {}
+static void kvm_gmem_make_shared(struct inode *inode, pgoff_t start, pgoff_t
end) { }
#endif
static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
@@ -729,7 +729,7 @@ static int __kvm_gmem_set_attributes(struct inode *inode,
pgoff_t start,
kvm_gmem_invalidate_start(inode, start, end);
if (!to_private)
- kvm_gmem_invalidate(inode, start, end);
+ kvm_gmem_make_shared(inode, start, end);
mas_store_prealloc(&mas, xa_mk_value(attrs));