On Tue, Jun 23, 2026, Yan Zhao wrote:
> On Tue, Jun 23, 2026 at 01:16:14PM +0800, Yan Zhao wrote:
> > On Mon, Jun 22, 2026 at 06:22:45PM -0700, Sean Christopherson wrote:
> > > On Mon, Jun 22, 2026, Yan Zhao wrote:
> > > > On Thu, Jun 18, 2026 at 05:32:00PM -0700, Ackerley Tng via B4 Relay
> > > > wrote:
> > > > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > > > > index ffe9d0db58c59..56d10333c61a7 100644
> > > > > --- a/arch/x86/kvm/vmx/tdx.c
> > > > > +++ b/arch/x86/kvm/vmx/tdx.c
> > > > > @@ -3198,8 +3198,12 @@ static int tdx_gmem_post_populate(struct kvm
> > > > > *kvm, gfn_t gfn, kvm_pfn_t pfn,
> > > > > if (KVM_BUG_ON(kvm_tdx->page_add_src, kvm))
> > > > > return -EIO;
> > > > >
> > > > > - if (!src_page)
> > > > > - return -EOPNOTSUPP;
> > > > > + if (!src_page) {
> > > > > + if (!gmem_in_place_conversion)
> > > > When userspace turns on gmem_in_place_conversion while creating
> > > > guest_memfd
> > > > without the MMAP flag, the absence of src_page should still be treated
> > > > as an
> > > > error.
> > >
> > > Why MMAP?
> > Hmm, I was showing a scenario that in-place conversion couldn't occur.
> > I didn't mean that with the MMAP flag, mmap() and user write must occur.
> >
> > > Shouldn't this be a general "if (!src_page && !up-to-date)"? Just
> > > because userspace _can_ mmap() the memory doesn't mean userspace _has_
> > > mmap()'d
> > > and written memory. And when write() lands, MMAP wouldn't be necessary to
> > > initialize the memory.
> > Do you mean using up-to-date flag as below?
Yes? I didn't actually look at the implementation details.
> > if (!src_page) {
> > src_page = pfn_to_page(pfn);
> > if (!folio_test_uptodate(page_folio(src_page)))
> > return -EOPNOTSUPP;
> > }
>
> Another concern with this fix is that:
> commit "KVM: guest_memfd: Zero page while getting pfn" [1] always marks the
> folio uptodate before reaching post_populate().
>
> [1]
> https://lore.kernel.org/all/[email protected]/
>
> > One concern is that TDX now does not much care about the up-to-date flag
> > since
> > TDX doesn't rely on the flag to clear pages on conversions.
> > I'm not sure if the flag can be reliably checked in this case. e.g.,
> > now the whole folio is marked up-to-date even if only part of it is faulted
> > by
> > user access.
> > Ensuring that the up-to-date flag works correctly with huge page support
> > seems
> > to have more effort than introducing a dedicated flag for TDX.
> >
> > > > Additionally, to properly enable in-place copying for the TDX initial
> > > > memory
> > > > region, userspace must not only specify source_addr to NULL, but also
> > > > follow
> > > > a specific sequence (where steps 1/2/3/7 are required only for in-place
> > > > copy):
> > > > 1. create guest_memfd with MMAP flag
> > > > 2. mmap the guest_memfd.
> > > > 3. convert the initial memory range to shared.
> > > > 4. copy initial content to the source page.
> > > > 5. convert the initial memory range to private
> > > > 6. invoke ioctl KVM_TDX_INIT_MEM_REGION.
> > > > 7. do not unmap the source backend.
> > > >
> > > > So, would it be reasonable to introduce a dedicated flag that allows
> > > > userspace
> > > > to explicitly opt into the in-place copy functionality? e.g.,
> > >
> > > Why? It's userspace's responsibility to get the above right. If
> > > userspace fails
> > > to provide a src_page when it doesn't want in-place copy, that's a
> > > userspace bug.
> > I mean if userspace specifies a NULL source_addr by mistake, it's better for
> > kernel to detect this mistake, similar to how it validates whether
> > source_addr
> > is PAGE_ALIGNED.
The alignment case is different. If userspace provides an unaligned value, KVM
*can't* do what userspace is asking because hardware and thus KVM only supports
converting on page boundaries.
For a NULL source, KVM can still do what userspace is asking. Rejecting
userspace's
request would then be making assumptions about what userspace wants.
> > Since userspace already needs to perform additional steps to enable in-place
> > copy, specifying a dedicated flag to indicate that the NULL source_addr is
> > intentional seems like a reasonable burden.
I don't see how it adds any value. I wouldn't be at all surprised if most VMMs
just wen up with code that does:
if (in-place) {
src = NULL;
flags |= KVM_TDX_IN_PLACE_COPY_INITIAL_MEMORY_REGION;
}