Re: [PATCH RFC v3 00/43] guest_memfd: In-place conversion support

Sean Christopherson Fri, 13 Mar 2026 08:48:46 -0700

On Fri, Mar 13, 2026, Ackerley Tng wrote:
> Hi,
> 
> (Here's the motivation for this series, which I realized was missing from
> the earlier revisions of this series)


...

> I'm intending RFC (v3) as a basis for discussion of flags/content
> modes (name TBD) to allow userspace to request guarantees on how the memory
> contents will look like after setting memory attributes. The last 6 patches
> implement content mode support. These patches will be reordered, and some
> of them could be absorbed into earlier patches, in later revisions.
> 
> Here are the discussion points I can think of (please add on):
> 
> 1. (Might hopefully resolve soon?) Should ZERO be supported on shared to
>    private conversions? Discussion is at [6].

No.  There is no use case.  The entire point of CoCo is that the VMM is 
untrusted.
Having the guest rely on the VMM to zero memory makes no sense whatsoever.  
There
may be a contract between the trusted whatever and the guest, but that's between
those two entities, the VMM is not involved, period.

PRESERVE is different because the intent is to allow the guest to operate on
*untrusted* data.  Operating on untrusted zeros is nonsensical.

ZERO for private=>shared is different between the VMM trusts the host kernel.

> 2. Do we need a CAP for userspace to query the flags/modes supported?

Yes.

>    It seems like there won't be anything dynamic about the flags/modes
>    supported.
> 
>    The userspace code can check what platform it is running on, and then
>    decide ZERO or PRESERVE based on the platform:
> 
>    If the VM is running on TDX,

No.  No, no, no, no.  I have said this over, and over, and over.  The contract
is between userspace and KVM, not between userspace and the underlying CoCo
implementation.  Anything that requires making assumptions based on the VM type
is a non-starter for me.

>    it would want to specify ZERO all the
>    time. If the VM were running on pKVM it would want to specify PRESERVE
>    if it wants to enable in-place sharing, and ZERO if it wants to zero the
>    memory.
> 
>    If someday TDX supports PRESERVE, then there's room for discovery of
>    which algorithm to choose when running the guest. Perhaps that's when
>    the CAP should be introduced?
> 
> 3. What do people think of the structure of how various content modes are
>    checked for support or applied? I used overridable weak functions for
>    architectures that haven't defined support, and defined overrides for
>    x86 to show how I think it would work. For CoCo platforms, I only
>    implemented TDX for illustration purposes and might need help with the
>    other platforms. Should I have used kvm_x86_ops? I tried and found
>    myself defining lots of boilerplate.
> 
> 4. enum for ZERO and PRESERVE?
> 
>    Pros:
> 
>    * No way to define both ZERO and PRESERVE (make impossible states
>      unrepresentable)
>        * e.g. enum kvm_device_type in __u32 type in struct
>          kvm_create_device
>        * But maybe someday some modes can be used together?

Huh?  Oh, you don't mean "enum", you mean "values vs. flags".  Because in C you
can obviously have an enum of flags.

I don't have a strong preference, though I think I'd vote for flags.

Practically speaking, I doubt we'll ever have more than DEFAULT, ZERO, and 
PRESERVE,
i.e. more than '0', '1, and '2'.  Perhaps I lack imagination, but I can't think
of any operation that we would want to become ABI.  ZERO is special purely 
because
various CoCo implementations already zero memory on conversion.  Everything else
fits into PRESERVE, because if the kernel perform the operation, then userspace
can do the same, and likely more performantly and obviously without needing a
contract with KVM.

The only other option I can think of is if a CoCo implementation wanted to use 
an
specific value other than '0' to fill a page on conversion.  Given that starting
from '0' is by far the most common state in computing, I just don't see that
happening.  E.g. that's be like adding k1salloc() in addition to kmalloc() and
kzalloc().

So, we're likely only going to have DEFAULT, ZERO, and PRESERVE, at which point
whether we use flags or values is a wash in terms of how many bits we need: 2.

If we use flags, then we can have a single CAP to enumerate all FLAGS that are
supported KVM_SET_MEMORY_ATTRIBUTES2.  If we use values, we'd need a separate 
CAP
for flags and a separate cap for conversion operations.

Using values would allow providing a dedicated field in kvm_memory_attributes2,
which _might_ make some code more readable.  But for me, that doesn't outweigh 
the
disadvantage of needing another CAP.

Re: [PATCH RFC v3 00/43] guest_memfd: In-place conversion support

Reply via email to