On 14/11/2025 15:18, Kalyazin, Nikita wrote:
On systems that support shared guest memory, write() is useful, for
example, for population of the initial image. Even though the same can
also be achieved via userspace mapping and memcpying from userspace,
write() provides a more performant option because it does not need to
set user page tables and it does not cause a page fault for every page
like memcpy would. Note that memcpy cannot be accelerated via
MADV_POPULATE_WRITE as it is not supported by guest_memfd and relies on
GUP.
Populating 512MiB of guest_memfd on a x86 machine:
- via memcpy: 436 ms
- via write: 202 ms (-54%)
Only PAGE_ALIGNED offset and len are allowed. Even though non-aligned
writes are technically possible, when in-place conversion support is
implemented [1], the restriction makes handling of mixed shared/private
huge pages simpler. write() will only be allowed to populate shared
pages.
When direct map removal is implemented [2]
- write() will not be allowed to access pages that have already
been removed from direct map
- on completion, write() will remove the populated pages from
direct map
While it is technically possible to implement read() syscall on systems
with shared guest memory, it is not supported as there is currently no
use case for it.
[1]
https://lore.kernel.org/kvm/[email protected]
[2]
https://lore.kernel.org/kvm/[email protected]
I failed to include links to previous versions:
v7:
- Sean: add GUEST_MEMFD_FLAG_WRITE and documentation for it
- Ackerley: only allow PAGE_ALIGNED offset and len
- Sean/Ackerley: formatting fixes
v6:
- https://lore.kernel.org/kvm/[email protected]
- Make write support conditional on mmap support instead of relying on
the up-to-date flag to decide whether writing to a page is allowed
- James: Remove dependencies on folio_test_large
- James: Remove page alignment restriction
- James: Formatting fixes
v5:
- https://lore.kernel.org/kvm/[email protected]
- Replace the call to the unexported filemap_remove_folio with
zeroing the bytes that could not be copied
- Fix checkpatch findings
v4:
- https://lore.kernel.org/kvm/[email protected]
- Switch from implementing the write callback to write_iter
- Remove conditional compilation
v3:
- https://lore.kernel.org/kvm/[email protected]
- David/Mike D: Only compile support for the write syscall if
CONFIG_KVM_GMEM_SHARED_MEM (now gone) is enabled.
v2:
- https://lore.kernel.org/kvm/[email protected]
- Switch from an ioctl to the write syscall to implement population
v1:
- https://lore.kernel.org/kvm/[email protected]
Nikita Kalyazin (2):
KVM: guest_memfd: add generic population via write
KVM: selftests: update guest_memfd write tests
Documentation/virt/kvm/api.rst | 2 +
include/linux/kvm_host.h | 2 +-
include/uapi/linux/kvm.h | 1 +
.../testing/selftests/kvm/guest_memfd_test.c | 58 +++++++++++++++++--
virt/kvm/guest_memfd.c | 52 +++++++++++++++++
5 files changed, 108 insertions(+), 7 deletions(-)
base-commit: 8a4821412cf2c1429fffa07c012dd150f2edf78c
--
2.50.1