On Fri, Nov 14, 2025, Nikita Kalyazin wrote:
> ---
>  Documentation/virt/kvm/api.rst |  2 ++
>  include/linux/kvm_host.h       |  2 +-
>  include/uapi/linux/kvm.h       |  1 +
>  virt/kvm/guest_memfd.c         | 52 ++++++++++++++++++++++++++++++++++
>  4 files changed, 56 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 57061fa29e6a..9541e95fc2ed 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6448,6 +6448,8 @@ specified via KVM_CREATE_GUEST_MEMFD.  Currently 
> defined flags:
>                                 without INIT_SHARED will be marked private).
>                                 Shared memory can be faulted into host 
> userspace
>                                 page tables. Private memory cannot.
> +  GUEST_MEMFD_FLAG_WRITE       Enable using write() on the guest_memfd file
> +                               descriptor.

Not the greatest place for it due to limited space, but the page alignment and
shared restrictions should be documented, and this seems to be the best spot.
And whatever we do on a partial copy also needs to be documented.  E.g.

  GUEST_MEMFD_FLAG_WRITE       Enable using write() on the guest_memfd file
                               descriptor.  The start and size of the write
                               must be page aligned, and all pages must be in
                               a SHARED state.  If the full buffer cannot be
                               copied for a given page, <something happens>.

> @@ -421,6 +423,53 @@ void kvm_gmem_init(struct module *module)
>       kvm_gmem_fops.owner = module;
>  }
>  
> +static bool kvm_gmem_supports_write(struct inode *inode)
> +{
> +     const u64 flags = (u64)inode->i_private;
> +
> +     return flags & GUEST_MEMFD_FLAG_WRITE;
> +}
> +
> +static int kvm_gmem_write_begin(const struct kiocb *kiocb,
> +                             struct address_space *mapping,
> +                             loff_t pos, unsigned int len,
> +                             struct folio **folio, void **fsdata)
> +{
> +     struct inode *inode = file_inode(kiocb->ki_filp);
> +
> +     if (!kvm_gmem_supports_write(inode))

Eh, no need for a helper, especially since flags is now easier to get at:

        if (!(GMEM_I(inode)->flags | GUEST_MEMFD_FLAG_WRITE))
                return -ENODEV;

I also think we should leave ourselves a safety net for in-place conversion, and
WARN if the gmem instance isn't INIT_SHARED:

        if (WARN_ON_ONCE(!(GMEM_I(inode)->flags & 
GUEST_MEMFD_FLAG_INIT_SHARED)))
                return -EBUSY;

That will also provide a good place to actually verify the memory is shared once
in-place conversion comes along.

> +             return -ENODEV;
> +
> +     if (pos + len > i_size_read(inode))
> +             return -EINVAL;
> +
> +     if (!IS_ALIGNED(pos, PAGE_SIZE) || !IS_ALIGNED(len, PAGE_SIZE))
> +             return -EINVAL;
> +
> +     *folio = kvm_gmem_get_folio(inode, pos >> PAGE_SHIFT);
> +     if (IS_ERR(*folio))
> +             return PTR_ERR(*folio);
> +
> +     return 0;
> +}
> +
> +static int kvm_gmem_write_end(const struct kiocb *kiocb,
> +                           struct address_space *mapping,
> +                           loff_t pos, unsigned int len,
> +                           unsigned int copied,
> +                           struct folio *folio, void *fsdata)
> +{
> +     if (!folio_test_uptodate(folio)) {
> +             folio_zero_range(folio, copied, len - copied);

Hmm, do we actually want to zero and silently ignore the failure?  Given the
intended use case, silently failing here would be a terrible outcome.  Would it
makes sense to instead do this?

        if (len != copied)
                return -EFAULT;

        if (!folio_test_uptodate(folio))
                folio_mark_uptodate(folio);

        folio_unlock(folio);
        folio_put(folio);

        return copied;

That will cause generic_perform_write() to report -EFAULT if no pages were 
written,
or IIUC, return the position of the last _full_ page that was written.  Then in
the unlikely scenario userspace wants to retry, they can retry starting at the
page that was partially written.  That seems like what VMMs generally would 
want,
not silent failure.

Reply via email to