Sean Christopherson <[email protected]> writes:

> On Tue, Jun 23, 2026, Yan Zhao wrote:
>> On Tue, Jun 23, 2026 at 01:16:14PM +0800, Yan Zhao wrote:
>> > On Mon, Jun 22, 2026 at 06:22:45PM -0700, Sean Christopherson wrote:
>> > > On Mon, Jun 22, 2026, Yan Zhao wrote:
>> > > > On Thu, Jun 18, 2026 at 05:32:00PM -0700, Ackerley Tng via B4 Relay 
>> > > > wrote:
>> > > > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
>> > > > > index ffe9d0db58c59..56d10333c61a7 100644
>> > > > > --- a/arch/x86/kvm/vmx/tdx.c
>> > > > > +++ b/arch/x86/kvm/vmx/tdx.c
>> > > > > @@ -3198,8 +3198,12 @@ static int tdx_gmem_post_populate(struct kvm 
>> > > > > *kvm, gfn_t gfn, kvm_pfn_t pfn,
>> > > > >      if (KVM_BUG_ON(kvm_tdx->page_add_src, kvm))
>> > > > >              return -EIO;
>> > > > >
>> > > > > -    if (!src_page)
>> > > > > -            return -EOPNOTSUPP;
>> > > > > +    if (!src_page) {
>> > > > > +            if (!gmem_in_place_conversion)
>> > > > When userspace turns on gmem_in_place_conversion while creating 
>> > > > guest_memfd
>> > > > without the MMAP flag, the absence of src_page should still be treated 
>> > > > as an
>> > > > error.
>> > >
>> > > Why MMAP?
>> > Hmm, I was showing a scenario that in-place conversion couldn't occur.
>> > I didn't mean that with the MMAP flag, mmap() and user write must occur.
>> >
>> > > Shouldn't this be a general "if (!src_page && !up-to-date)"?  Just
>> > > because userspace _can_ mmap() the memory doesn't mean userspace _has_ 
>> > > mmap()'d
>> > > and written memory.  And when write() lands, MMAP wouldn't be necessary 
>> > > to
>> > > initialize the memory.
>> > Do you mean using up-to-date flag as below?
>
> Yes?  I didn't actually look at the implementation details.
>
>> > if (!src_page) {
>> >    src_page = pfn_to_page(pfn);
>> >    if (!folio_test_uptodate(page_folio(src_page)))
>> >            return -EOPNOTSUPP;
>> > }

Yan is right that with the earlier patch "Zero page while getting pfn",
folio_test_uptodate() here will always return true.

Actually, this is an alternative fix for the issue Sashiko pointed out
on v7 where userspace can do a populate() (either TDX or SNP) without
first allocating the page, with src_address == NULL, and leak
uninitialized memory into the guest.

Advantage of using the uptodate check in populate: if the host never
allocates the page, populate doesn't incur zeroing before writing the
page anyway in populate().

Disadvantage: Both TDX and SNP will have to implement this uptodate
check. guest_memfd can't check centrally because for SNP, for a
PAGE_TYPE_ZERO, !src_page should be allowed with a !uptodate page since
firmware will zero and there's no leakage of uninitialized host memory?

>>
>> Another concern with this fix is that:
>> commit "KVM: guest_memfd: Zero page while getting pfn" [1] always marks the
>> folio uptodate before reaching post_populate().
>>
>> [1] 
>> https://lore.kernel.org/all/[email protected]/
>>
>> > One concern is that TDX now does not much care about the up-to-date flag 
>> > since
>> > TDX doesn't rely on the flag to clear pages on conversions.
>> > I'm not sure if the flag can be reliably checked in this case. e.g.,
>> > now the whole folio is marked up-to-date even if only part of it is 
>> > faulted by
>> > user access.
>> > Ensuring that the up-to-date flag works correctly with huge page support 
>> > seems
>> > to have more effort than introducing a dedicated flag for TDX.
>> >
>> > > > Additionally, to properly enable in-place copying for the TDX initial 
>> > > > memory
>> > > > region, userspace must not only specify source_addr to NULL, but also 
>> > > > follow
>> > > > a specific sequence (where steps 1/2/3/7 are required only for 
>> > > > in-place copy):
>> > > > 1. create guest_memfd with MMAP flag
>> > > > 2. mmap the guest_memfd.
>> > > > 3. convert the initial memory range to shared.
>> > > > 4. copy initial content to the source page.
>> > > > 5. convert the initial memory range to private
>> > > > 6. invoke ioctl KVM_TDX_INIT_MEM_REGION.
>> > > > 7. do not unmap the source backend.
>> > > >
>> > > > So, would it be reasonable to introduce a dedicated flag that allows 
>> > > > userspace
>> > > > to explicitly opt into the in-place copy functionality? e.g.,
>> > >
>> > > Why?  It's userspace's responsibility to get the above right.  If 
>> > > userspace fails
>> > > to provide a src_page when it doesn't want in-place copy, that's a 
>> > > userspace bug.

Yan, is your concern that userspace forgot to update the code and
forgets to provide a src_page, and if we keep the "Zero page while
getting pfn" patch, ends up with the guest silently having a zero page?
I think that would be found quite early in userspace VMM testing...

>> > I mean if userspace specifies a NULL source_addr by mistake, it's better 
>> > for
>> > kernel to detect this mistake, similar to how it validates whether 
>> > source_addr
>> > is PAGE_ALIGNED.
>
> The alignment case is different.  If userspace provides an unaligned value, 
> KVM
> *can't* do what userspace is asking because hardware and thus KVM only 
> supports
> converting on page boundaries.
>
> For a NULL source, KVM can still do what userspace is asking.  Rejecting 
> userspace's
> request would then be making assumptions about what userspace wants.
>

Also, +1 on this, what if userspace, knowing that pages are zeroed on
allocation, actually wants to rely on that to get a zero page in the guest?

>> > Since userspace already needs to perform additional steps to enable 
>> > in-place
>> > copy, specifying a dedicated flag to indicate that the NULL source_addr is
>> > intentional seems like a reasonable burden.
>
> I don't see how it adds any value.  I wouldn't be at all surprised if most 
> VMMs
> just wen up with code that does:
>
>       if (in-place) {
>               src = NULL;
>               flags |= KVM_TDX_IN_PLACE_COPY_INITIAL_MEMORY_REGION;
>       }

Reply via email to