From: "Kiryl Shutsemau (Meta)" <[email protected]> Read-write protect mode (UFFDIO_REGISTER_MODE_RWP) is supported starting from Linux 7.2. It traps every access -- read or write -- to a present page within a registered range. The matching UAPI consists of:
- UFFDIO_REGISTER_MODE_RWP registration-mode bit - UFFD_FEATURE_RWP capability bit - UFFD_FEATURE_RWP_ASYNC async (in-kernel) fault resolution - UFFDIO_RWPROTECT install / remove RWP on a range - UFFDIO_SET_MODE runtime sync/async toggle - UFFD_PAGEFAULT_FLAG_RWP new pagefault.flags bit Document the new registration-mode entry, the "Userfaultfd read-write protect mode" section, the new pagefault flag, and a VERSIONS line. Signed-off-by: Kiryl Shutsemau <[email protected]> --- man2/userfaultfd.2 | 147 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 146 insertions(+), 1 deletion(-) diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2 index cee7c01d2512..0e702f2f4969 100644 --- a/man2/userfaultfd.2 +++ b/man2/userfaultfd.2 @@ -24,7 +24,7 @@ .\" the source, must acknowledge the copyright and authors of this work. .\" %%%LICENSE_END .\" -.TH USERFAULTFD 2 2021-03-22 "Linux" "Linux Programmer's Manual" +.TH USERFAULTFD 2 2026-05-22 "Linux" "Linux Programmer's Manual" .SH NAME userfaultfd \- create a file descriptor for handling page faults in user space .SH SYNOPSIS @@ -105,6 +105,28 @@ The faulted thread will be stopped from execution until user-space write-unprotects the page using an .B UFFDIO_WRITEPROTECT ioctl. +.TP +.BR UFFDIO_REGISTER_MODE_RWP " (since Linux 7.2)" +When registered with +.B UFFDIO_REGISTER_MODE_RWP +mode, user-space will receive a page-fault notification +on any access \(em read or write \(em to a present page within the range. +By default the faulted thread will be stopped from execution until +user-space removes the protection using a +.B UFFDIO_RWPROTECT +ioctl; +if +.B UFFD_FEATURE_RWP_ASYNC +was negotiated, the kernel restores access in place and the faulted +thread continues without blocking. +.IP +.B UFFDIO_REGISTER_MODE_RWP +and +.B UFFDIO_REGISTER_MODE_WP +cannot be combined on the same range; attempting to register with both +bits set returns +.BR EINVAL . +See the "Userfaultfd read-write protect mode" section below. .PP Multiple modes can be enabled at the same time for the same memory range. .PP @@ -186,6 +208,21 @@ The user needs to resolve the page fault by unprotecting the faulted page and kicking the faulted thread to continue. For more information, please refer to the "Userfaultfd write-protect mode" section. +.PP +Since Linux 7.2, userfaultfd can do read-write protection tracking, which +traps every access (read or write) to a present page within a registered +range. +One should check against the feature bit +.B UFFD_FEATURE_RWP +before using this feature, and optionally negotiate +.B UFFD_FEATURE_RWP_ASYNC +to have the kernel auto-restore page permissions on fault without +delivering a notification. +This mode is intended for working-set tracking by VM memory managers and +similar callers; cold pages can then be evicted using independent kernel +interfaces. +For more information, +please refer to the "Userfaultfd read-write protect mode" section. .\" .SS Userfaultfd operation After the userfaultfd object is created with @@ -322,6 +359,98 @@ should have the flag cleared upon the faulted page or range. .PP Write-protect mode supports only private anonymous memory. +.SS Userfaultfd read-write protect mode (since Linux 7.2) +Since Linux 7.2, userfaultfd supports read-write protect mode. +Unlike write-protect mode, every access \(em read or write \(em to a +protected present page generates a userfaultfd notification. +It works on anonymous, shmem, and hugetlbfs mappings. +.PP +The user needs to first check availability of this feature using the +.B UFFDIO_API +ioctl against the feature bit +.B UFFD_FEATURE_RWP +before using this mode. +On kernels or architectures that cannot support read-write protection, +the bit is masked out from +.I uffdio_api.features +on return from +.BR UFFDIO_API ; +callers should inspect the returned features and fall back to another +tracking mechanism when the bit is absent. +.PP +To register with userfaultfd read-write protect mode, the user needs to +initiate the +.B UFFDIO_REGISTER +ioctl with mode +.B UFFDIO_REGISTER_MODE_RWP +set. +.B UFFDIO_REGISTER_MODE_RWP +cannot be combined with +.BR UFFDIO_REGISTER_MODE_WP ; +however it can be combined with +.B UFFDIO_REGISTER_MODE_MISSING +when the caller also wants notifications for fresh page populations. +.PP +After registration, the user can read-write-protect any existing memory +within the range using the +.B UFFDIO_RWPROTECT +ioctl where +.I uffdio_rwprotect.mode +is set to +.BR UFFDIO_RWPROTECT_MODE_RWP . +Read-write protection only affects pages that are currently populated +in the range; unpopulated addresses remain unpopulated and fall through +to the normal missing-page path on first access. +.PP +Protection is preserved across page reclaim and migration; it is +.I not +preserved across operations that drop the underlying page +.RB ( "MADV_DONTNEED " "on anonymous memory, hole-punch on shmem," +truncation of a file mapping). +Callers must re-arm the range with +.B UFFDIO_RWPROTECT +after any such operation. +.PP +When an access fault happens against a protected page, user-space will +receive a page-fault notification whose +.I uffd_msg.pagefault.flags +field has the +.B UFFD_PAGEFAULT_FLAG_RWP +bit set. +.PP +To resolve a read-write-protect page fault, the user initiates another +.B UFFDIO_RWPROTECT +ioctl whose +.I uffdio_rwprotect.mode +has the +.B UFFDIO_RWPROTECT_MODE_RWP +flag cleared. +This restores the original VMA permissions on the affected pages and +wakes any blocked threads (unless +.B UFFDIO_RWPROTECT_MODE_DONTWAKE +is also set). +.PP +If +.B UFFD_FEATURE_RWP_ASYNC +was negotiated alongside +.BR UFFD_FEATURE_RWP , +the kernel resolves access faults in place without delivering a +notification: page permissions are restored automatically and the +faulting thread continues. +Callers can later reconstruct which pages were touched by inspecting the +.B PAGE_IS_ACCESSED +bit returned by the +.B PAGEMAP_SCAN +ioctl described in +.BR ioctl_userfaultfd (2) +and +.IR Documentation/admin\-guide/mm/pagemap.rst +in the Linux kernel source. +.PP +The async mode can be toggled at runtime using the +.B UFFDIO_SET_MODE +ioctl, which lets a single userfaultfd switch between async detection +and synchronous eviction without re-registering the range. .SS Reading from the userfaultfd structure Each .BR read (2) @@ -473,6 +602,12 @@ If the address is in a range that was registered with the .B UFFDIO_REGISTER_MODE_WP flag, when this bit is set, it means it is a write-protect fault. Otherwise it is a page-missing fault. +.TP +.BR UFFD_PAGEFAULT_FLAG_RWP " (since Linux 7.2)" +If the address is in a range that was registered with the +.B UFFDIO_REGISTER_MODE_RWP +flag, this bit indicates that the fault was triggered by an access to a +read-write-protected page (either a read or a write). .RE .TP .I pagefault.feat.pid @@ -574,6 +709,16 @@ system call first appeared in Linux 4.3. .PP The support for hugetlbfs and shared memory areas and non-page-fault events was added in Linux 4.11 +.PP +Read-write protect mode +.RB ( UFFDIO_REGISTER_MODE_RWP ", " UFFD_FEATURE_RWP ", " +.BR UFFDIO_RWPROTECT ) +was added in Linux 7.2, +together with +.B UFFD_FEATURE_RWP_ASYNC +and the +.B UFFDIO_SET_MODE +runtime mode toggle. .SH CONFORMING TO .BR userfaultfd () is Linux-specific and should not be used in programs intended to be -- 2.51.2

