Sean Christopherson <[email protected]> writes:
> On Thu, May 21, 2026, Fuad Tabba wrote:
>> Hi,
>>
>> On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay
>> <[email protected]> wrote:
>> >
>> > From: Michael Roth <[email protected]>
>> >
>> > For vm_memory_attributes=1, in-place conversion/population is not
>> > supported, so the initial contents necessarily must need to come
>> > from a separate src address, which is enforced by the current
>> > implementation. However, for vm_memory_attributes=0, it is possible for
>> > guest memory to be initialized directly from userspace by mmap()'ing the
>> > guest_memfd and writing to it while the corresponding GPA ranges are in
>> > a 'shared' state before converting them to the 'private' state expected
>> > by KVM_SEV_SNP_LAUNCH_UPDATE.
>> >
>> > Update the handling/documentation for KVM_SEV_SNP_LAUNCH_UPDATE to allow
>> > for 'uaddr' to be set to NULL when vm_memory_attributes=0, which
>> > SNP_LAUNCH_UPDATE will then use to determine when it should/shouldn't
>> > copy in data from a separate memory location. Continue to enforce
>> > non-NULL for the original vm_memory_attributes=1 case.
>> >
>> > Signed-off-by: Michael Roth <[email protected]>
>> > [Added src_page check in error handling path when the firmware command
>> > fails]
>> > [Dropped ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES]
>> > Signed-off-by: Ackerley Tng <[email protected]>
>>
>> I'm not very familiar with the SEV-SNP populate flows, but it looks
>> like Sashiko is on to something:
>> https://sashiko.dev/#/patchset/20260507-gmem-inplace-conversion-v6-0-91ab5a8b19a4%40google.com?part=21
>>
>> - a potential read-only page overwrite, because src_page is acquired
>> via get_user_pages_fast() without the FOLL_WRITE flag, but is then
>> overwritten via memcpy
>
> Oof, yeah, that's bad. Adding FOLL_WRITE to kvm_gmem_populate() feels wrong,
> and
> could break uABI, but doing gup() in SNP code would reintroduce the AB-BA
> issue
> with filemap_invalidate_lock().
>
> Aha! Not if we use get_user_page_fast_only(). Ugh, but then we'd have to
> plumb
> the userspace address into the post-populated callback.
>
> Hrm. Given that no one has yelled about overwriting their CPUID page, and
> given
> that the CPUID page is likely dynamically created and thus is unlikely to be a
> read-only mapping (e.g. versus the initial image), maybe this?
>
Overwriting the CPUID page is by design, I think. IIUC if the SNP
firmware doesn't like something about the CPUID page, it can update
src_page and then return an error to userspace.
Userspace should then check if it agrees with the updated CPUID contents
and then retry if it agrees.
> diff --git arch/x86/kvm/svm/sev.c arch/x86/kvm/svm/sev.c
> index 37d4cfa5d980..c73c028d72c1 100644
> --- arch/x86/kvm/svm/sev.c
> +++ arch/x86/kvm/svm/sev.c
> @@ -2456,6 +2456,7 @@ static int snp_launch_update(struct kvm *kvm, struct
> kvm_sev_cmd *argp)
> sev_populate_args.type = params.type;
>
> count = kvm_gmem_populate(kvm, params.gfn_start, src, npages,
> + params.type == KVM_SEV_SNP_PAGE_TYPE_CPUID,
I think this makes sense given that writing to src_page can only happen
when params.type == KVM_SEV_SNP_PAGE_TYPE_CPUID (this is explicitly one
of the guards in sev_gmem_post_populate()):
/*
* If the firmware command failed handle the reclaim and cleanup of that
* PFN before reporting an error.
*
* Additionally, when invalid CPUID function entries are detected,
* firmware writes the expected values into the page and leaves it
* unencrypted so it can be used for debugging and error-reporting.
*
* Copy this page back into the source buffer so userspace can use this
* information to provide information on which CPUID leaves/fields
* failed CPUID validation.
*/
if (ret && !snp_page_reclaim(kvm, pfn) &&
sev_populate_args->type == KVM_SEV_SNP_PAGE_TYPE_CPUID &&
sev_populate_args->fw_error == SEV_RET_INVALID_PARAM && src_page) {
void *src_vaddr = kmap_local_page(src_page);
void *dst_vaddr = kmap_local_pfn(pfn);
memcpy(src_vaddr, dst_vaddr, PAGE_SIZE);
kunmap_local(src_vaddr);
kunmap_local(dst_vaddr);
}
> sev_gmem_post_populate, &sev_populate_args);
> if (count < 0) {
> argp->error = sev_populate_args.fw_error;
> diff --git arch/x86/kvm/vmx/tdx.c arch/x86/kvm/vmx/tdx.c
> index f97bcf580e6d..33f35be4455b 100644
> --- arch/x86/kvm/vmx/tdx.c
> +++ arch/x86/kvm/vmx/tdx.c
> @@ -3188,7 +3188,7 @@ static int tdx_vcpu_init_mem_region(struct kvm_vcpu
> *vcpu, struct kvm_tdx_cmd *c
> };
> gmem_ret = kvm_gmem_populate(kvm, gpa_to_gfn(region.gpa),
>
> u64_to_user_ptr(region.source_addr),
> - 1, tdx_gmem_post_populate, &arg);
> + 1, false,
> tdx_gmem_post_populate, &arg);
And TDX doesn't try to write src_page, so this is good too.
> if (gmem_ret < 0) {
> ret = gmem_ret;
> break;
> diff --git include/linux/kvm_host.h include/linux/kvm_host.h
> index 61a3430957f2..b83cda2870ba 100644
> --- include/linux/kvm_host.h
> +++ include/linux/kvm_host.h
> @@ -2596,7 +2596,8 @@ int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn,
> kvm_pfn_t pfn, int max_ord
> typedef int (*kvm_gmem_populate_cb)(struct kvm *kvm, gfn_t gfn, kvm_pfn_t
> pfn,
> struct page *page, void *opaque);
>
> -long kvm_gmem_populate(struct kvm *kvm, gfn_t gfn, void __user *src, long
> npages,
> +long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src,
> + long npages, bool writable,
What do you think of need_writable_src instead of just writable for the
variable name?
> kvm_gmem_populate_cb post_populate, void *opaque);
> #endif
>
> diff --git virt/kvm/guest_memfd.c virt/kvm/guest_memfd.c
> index a35a55571a2d..6553d4e032ce 100644
> --- virt/kvm/guest_memfd.c
> +++ virt/kvm/guest_memfd.c
> @@ -858,7 +858,8 @@ static long __kvm_gmem_populate(struct kvm *kvm, struct
> kvm_memory_slot *slot,
> return ret;
> }
>
> -long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src,
> long npages,
> +long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src,
> + long npages, bool writable,
> kvm_gmem_populate_cb post_populate, void *opaque)
> {
> struct kvm_memory_slot *slot;
> @@ -892,8 +893,9 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn,
> void __user *src, long
>
> if (src) {
> unsigned long uaddr = (unsigned long)src + i *
> PAGE_SIZE;
> + unsigned int flags = writable ? FOLL_WRITE : 0;
How about using FOLL_WRITE | FOLL_NOFAULT so if it weren't writable to
start with, don't CoW, just error out?
Like you said above the CPUID page provided as src_page would have been
written to before, so it should have been mapped as writable.
>
> - ret = get_user_pages_fast(uaddr, 1, 0, &src_page);
> + ret = get_user_pages_fast(uaddr, 1, flags, &src_page);
If we stick with FOLL_WRITE, this also solves the case where a read-only
mapping or global zero page are provided as src_page, since
get_user_pages_fast() will do a copy-on-write if those were the inputs,
making it writable before the write happens (on failure) in
sev_gmem_post_populate().
> if (ret < 0)
> break;
> if (ret != 1) {
>
>> - an ordering violation with the kunmap_local() calls
>
> Yeesh, that's a new one for me. Thankfully this is 64-bit only, so it's not
> an
> issue.
>
>> These predate this patch series and are just being touched by the
>> 'src_page' addition, but if Sashiko's right, these should probably be
>> fixed sooner rather than later.
>
> Yeah, ditto with the offset wrapping case.