Ackerley Tng <[email protected]> writes: > > [...snip...] > > guest_memfd's populate will first check that the memory is shared, then > also set the memory to private after the populate. > > [...snip...] > Looking at this again, the above basically means that the entire conversion process needs to be performed within populate.
In addition to setting the attributes in guest_memfd as private, for consistency, populate will also have to do all the associated operations, especially unmapping from the host, checking refcounts, and the list of work in conversion will only increase in future with direct map removal/restoration and huge page merging. The complexity of conversion also means possible errors (EAGAIN for elevated refcounts and ENOMEM when we're out of memory), and error information like the offset where the elevated refcount was. It doesn't look like there's room for this information to be plumbed out through the platform-specific ioctls, and even if we make space, it seems odd to have conversion-related error information returned through the platform-specific call. I agree with the goal of not having KVM touch private memory contents as tracked by guest_memfd, so I'd like to propose that we distinguish: 1. private as tracked by KVM (guest_memfd/vm_memory_attributes) 2. private as tracked by trusted entity Currently, in TDX's populate flow, KVM doesn't do any copying, it only instructs TDX to do the copying. "TDH.MEM.PAGE.ADD Operands Information Definition" in the TDX module ABI spec says the source is accessed as shared, the destination is accessed as private, and in the case that source and destination are the same, the "In-Place Add" section says the source will be converted to private. In SNP's populate flow, SNP-specific code in KVM does the copying from shared memory into the destination, then makes the destination address private in the RMP table before telling the firmware to do the in-place encryption. I think we should think of the populate ioctls as being platform-specific ioctls that, when called, accept + destination address: private (as tracked by guest_memfd) + source address: shared (as tracked by guest_memfd) or NULL KVM doesn't touch private memory contents, even in this case, because it's really a platform-specific ioctl that handles loading, and the platform does expect the destination is private for both TDX and SNP at the firmware boundary. Since SNP (platform-specific) only allows in-place launch update, and KVM had to provide an interface that allows a different source address for support before in-place conversion, then SNP has to continue supporting the to-be-deprecated path by doing the copying within platform-specific code. For consistency, guest_memfd can continue to check that it tracks the destination address as private, and sev_gmem_populate will then hide the copying code away just to support the legacy case. The flow before in-place conversion is 1. Load memory (shared or non-guest_memfd memory) 2. KVM_SEV_SNP_LAUNCH_UPDATE or KVM_TDX_INIT_MEM_REGION (destination: gfn for separate private memory, source: shared memory) The proposed flow for in-place conversion is 1. INIT_SHARED or convert to shared 2. Load memory while it is shared 3. Convert to private (PRESERVE, or new flag?) 4. KVM_SEV_SNP_LAUNCH_UPDATE or KVM_TDX_INIT_MEM_REGION (destination: gfn for converted private memory, source: NULL) TLDR: + Think of populate ioctls not as KVM touching memory, but platform handling population. + KVM code (kvm_gmem_populate) still doesn't touch memory contents + post_populate is platform-specific code that handles loading into private destination memory just to support legacy non-in-place conversion. + Don't complicate populate ioctls by doing conversion just to support legacy use-cases where platform-specific code has to do copying on the host. >>> >>> [...snip...] >>>
