On 14/11/2025 15:18, Kalyazin, Nikita wrote:
On systems that support shared guest memory, write() is useful, for
example, for population of the initial image.  Even though the same can
also be achieved via userspace mapping and memcpying from userspace,
write() provides a more performant option because it does not need to
set user page tables and it does not cause a page fault for every page
like memcpy would.  Note that memcpy cannot be accelerated via
MADV_POPULATE_WRITE as it is not supported by guest_memfd and relies on
GUP.

Populating 512MiB of guest_memfd on a x86 machine:
  - via memcpy: 436 ms
  - via write:  202 ms (-54%)

Only PAGE_ALIGNED offset and len are allowed.  Even though non-aligned
writes are technically possible, when in-place conversion support is
implemented [1], the restriction makes handling of mixed shared/private
huge pages simpler.  write() will only be allowed to populate shared
pages.

When direct map removal is implemented [2]
  - write() will not be allowed to access pages that have already
    been removed from direct map
  - on completion, write() will remove the populated pages from
    direct map

While it is technically possible to implement read() syscall on systems
with shared guest memory, it is not supported as there is currently no
use case for it.

[1]
https://lore.kernel.org/kvm/[email protected]
[2]
https://lore.kernel.org/kvm/[email protected]

I failed to include links to previous versions:

v7:
 - Sean: add GUEST_MEMFD_FLAG_WRITE and documentation for it
 - Ackerley: only allow PAGE_ALIGNED offset and len
 - Sean/Ackerley: formatting fixes

v6:
 - https://lore.kernel.org/kvm/[email protected]
 - Make write support conditional on mmap support instead of relying on
   the up-to-date flag to decide whether writing to a page is allowed
 - James: Remove dependencies on folio_test_large
 - James: Remove page alignment restriction
 - James: Formatting fixes

v5:
 - https://lore.kernel.org/kvm/[email protected]
 - Replace the call to the unexported filemap_remove_folio with
   zeroing the bytes that could not be copied
 - Fix checkpatch findings

v4:
 - https://lore.kernel.org/kvm/[email protected]
 - Switch from implementing the write callback to write_iter
 - Remove conditional compilation

v3:
 - https://lore.kernel.org/kvm/[email protected]
 - David/Mike D: Only compile support for the write syscall if
   CONFIG_KVM_GMEM_SHARED_MEM (now gone) is enabled.
v2:
 - https://lore.kernel.org/kvm/[email protected]
 - Switch from an ioctl to the write syscall to implement population

v1:
 - https://lore.kernel.org/kvm/[email protected]


Nikita Kalyazin (2):
   KVM: guest_memfd: add generic population via write
   KVM: selftests: update guest_memfd write tests

  Documentation/virt/kvm/api.rst                |  2 +
  include/linux/kvm_host.h                      |  2 +-
  include/uapi/linux/kvm.h                      |  1 +
  .../testing/selftests/kvm/guest_memfd_test.c  | 58 +++++++++++++++++--
  virt/kvm/guest_memfd.c                        | 52 +++++++++++++++++
  5 files changed, 108 insertions(+), 7 deletions(-)


base-commit: 8a4821412cf2c1429fffa07c012dd150f2edf78c
--
2.50.1



Reply via email to