On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
<[email protected]> wrote:
>
> From: Ackerley Tng <[email protected]>
>
> When memory in guest_memfd is converted from private to shared, the
> platform-specific state associated with the guest-private pages must be
> invalidated or cleaned up.
>
> Iterate over the folios in the affected range and call the
> kvm_arch_gmem_invalidate() hook for each PFN range. This allows
> architectures to perform necessary teardown, such as updating hardware
> metadata or encryption states, before the pages are transitioned to the
> shared state.
>
> Invoke this helper after indicating to KVM's mmu code that an invalidation
> is in progress to stop in-flight page faults from succeeding.
>
> Reviewed-by: Fuad Tabba <[email protected]>
> Signed-off-by: Ackerley Tng <[email protected]>
Coming back to this after working through the arm64/pKVM side. My
Reviewed-by here is from the previous round and the patch hasn't
changed, but I missed an implication for arm64.
kvm_arch_gmem_invalidate() is now called from two paths with the same
(start, end) signature: folio teardown (kvm_gmem_free_folio) and
private->shared conversion (here). For SNP/TDX that's fine, conversion is
destructive anyway. For pKVM the two need opposite content semantics:
conversion must preserve the page in place (same physical page, the point
of in-place conversion without encryption), while teardown must scrub it
before returning it to the host.
The hook gets only a pfn range with no indication of which caller it's
serving, so arm64 can't give the two paths the behaviour they need. It
would help to signal intent on the conversion path: a reason/flag, a
separate hook, or not routing non-destructive conversion through the
teardown hook.
arm64 isn't here yet, so this isn't urgent, but the hook is gaining a
second caller now, and it's cheaper to leave room for the distinction
than to change a generic contract other arches depend on later.
Cheers,
/fuad
> ---
> virt/kvm/guest_memfd.c | 41 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 41 insertions(+)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 433f79047b9d1..3c94442bc8131 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -607,6 +607,42 @@ static bool kvm_gmem_is_safe_for_conversion(struct inode
> *inode, pgoff_t start,
> return safe;
> }
>
> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
> +static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t
> end)
> +{
> + struct folio_batch fbatch;
> + pgoff_t next = start;
> + int i;
> +
> + folio_batch_init(&fbatch);
> + while (filemap_get_folios(inode->i_mapping, &next, end - 1, &fbatch))
> {
> + for (i = 0; i < folio_batch_count(&fbatch); ++i) {
> + struct folio *folio = fbatch.folios[i];
> + pgoff_t start_index, end_index;
> + kvm_pfn_t start_pfn, end_pfn;
> +
> + start_index = max(start, folio->index);
> + end_index = min(end, folio_next_index(folio));
> + /*
> + * end_index is either in folio or points to
> + * the first page of the next folio. Hence,
> + * all pages in range [start_index, end_index)
> + * are contiguous.
> + */
> + start_pfn = folio_file_pfn(folio, start_index);
> + end_pfn = start_pfn + end_index - start_index;
> +
> + kvm_arch_gmem_invalidate(start_pfn, end_pfn);
> + }
> +
> + folio_batch_release(&fbatch);
> + cond_resched();
> + }
> +}
> +#else
> +static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t
> end) {}
> +#endif
> +
> static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
> size_t nr_pages, uint64_t attrs,
> pgoff_t *err_index)
> @@ -647,7 +683,12 @@ static int __kvm_gmem_set_attributes(struct inode
> *inode, pgoff_t start,
> */
>
> kvm_gmem_invalidate_start(inode, start, end);
> +
> + if (!to_private)
> + kvm_gmem_invalidate(inode, start, end);
> +
> mas_store_prealloc(&mas, xa_mk_value(attrs));
> +
> kvm_gmem_invalidate_end(inode, start, end);
> out:
> filemap_invalidate_unlock(mapping);
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>