On Wed, May 13, 2026 at 09:26:17AM +0300, Mike Rapoport wrote: > On Fri, May 08, 2026 at 04:55:26PM +0100, Kiryl Shutsemau (Meta) wrote: > > Add an admin-guide section covering UFFDIO_REGISTER_MODE_RWP: > > > > - sync and async fault models; > > - UFFDIO_RWPROTECT semantics; > > - UFFD_FEATURE_RWP_ASYNC; > > - UFFDIO_SET_MODE runtime mode flips. > > > > It also covers typical VMM working-set-tracking workflow from detection > > loop through sync-mode eviction and back to async. > > We'd also need man page update at some point :)
Will add a patch for man-pages in v3. > > Signed-off-by: Kiryl Shutsemau <[email protected]> > > Assisted-by: Claude:claude-opus-4-6 > > --- > > Documentation/admin-guide/mm/userfaultfd.rst | 226 ++++++++++++++++++- > > 1 file changed, 220 insertions(+), 6 deletions(-) > > > > diff --git a/Documentation/admin-guide/mm/userfaultfd.rst > > b/Documentation/admin-guide/mm/userfaultfd.rst > > index 1e533639fd50..5ac4ae3dff1b 100644 > > --- a/Documentation/admin-guide/mm/userfaultfd.rst > > +++ b/Documentation/admin-guide/mm/userfaultfd.rst > > @@ -275,16 +275,16 @@ tracking and it can be different in a few ways: > > - Dirty information will not get lost if the pte was zapped due to > > various reasons (e.g. during split of a shmem transparent huge page). > > > > - - Due to a reverted meaning of soft-dirty (page clean when uffd-wp bit > > - set; dirty when uffd-wp bit cleared), it has different semantics on > > - some of the memory operations. For example: ``MADV_DONTNEED`` on > > + - Due to a reverted meaning of soft-dirty (page clean when the uffd bit > > + is set; dirty when the uffd bit is cleared), it has different semantics > > + on some of the memory operations. For example: ``MADV_DONTNEED`` on > > anonymous (or ``MADV_REMOVE`` on a file mapping) will be treated as > > - dirtying of memory by dropping uffd-wp bit during the procedure. > > + dirtying of memory by dropping the uffd bit during the procedure. > > > > The user app can collect the "written/dirty" status by looking up the > > -uffd-wp bit for the pages being interested in /proc/pagemap. > > +uffd bit for the pages being interested in /proc/pagemap. > > > > -The page will not be under track of uffd-wp async mode until the page is > > +The page will not be under track of userfaultfd-wp async mode until the > > page is > > explicitly write-protected by ``ioctl(UFFDIO_WRITEPROTECT)`` with the mode > > flag ``UFFDIO_WRITEPROTECT_MODE_WP`` set. Trying to resolve a page fault > > that was tracked by async mode userfaultfd-wp is invalid. > > @@ -307,6 +307,220 @@ transparent to the guest, we want that same address > > range to act as if it was > > still poisoned, even though it's on a new physical host which ostensibly > > doesn't have a memory error in the exact same spot. > > > > +Read-Write Protection > > +--------------------- > > + > > +``UFFDIO_REGISTER_MODE_RWP`` enables read-write protection tracking on a > > +memory range. It is similar to (but faster than) ``mprotect(PROT_NONE)`` > > +combined with a signal handler; unlike ``mprotect(PROT_NONE)``, RWP only > > +traps accesses to *present* PTEs, so accesses to unpopulated addresses in a > > +protected range fall through to the normal missing-page path. It uses the > > +PROT_NONE hinting mechanism (same as NUMA balancing) to make pages > > +inaccessible while keeping them resident in memory. Works on anonymous, > > +shmem, and hugetlbfs memory. > > + > > +This is designed for VM memory managers that need to track the working set > > This feature? Or RWP mode? RWP. > > +of guest memory for cold page eviction to tiered or remote storage. > > + > > +**Setup:** > > + > > +1. Open a userfaultfd and enable ``UFFD_FEATURE_RWP`` via ``UFFDIO_API``. > > + Optionally request ``UFFD_FEATURE_RWP_ASYNC`` as well — it requires > > + ``UFFD_FEATURE_RWP`` to be set in the same ``UFFDIO_API`` call. > > + > > +2. Register the guest memory range with ``UFFDIO_REGISTER_MODE_RWP`` > > + (and ``UFFDIO_REGISTER_MODE_MISSING`` if evicted pages will need to be > > + fetched back from storage). > > + > > +**Feature availability:** > > + > > +RWP is built on top of two kernel primitives: a spare PTE bit owned by > > +userfaultfd (``CONFIG_HAVE_ARCH_USERFAULTFD_WP``) and arch support for > > Please spell out architecture. Ack. > > +present-but-inaccessible PTEs (``CONFIG_ARCH_HAS_PTE_PROTNONE``). When both > > +are available on a 64-bit kernel, the build selects > > +``CONFIG_USERFAULTFD_RWP=y`` and the ``VM_UFFD_RWP`` VMA flag becomes > > +available. > > + > > +``UFFD_FEATURE_RWP`` and ``UFFD_FEATURE_RWP_ASYNC`` are masked out of the > > +features returned by ``UFFDIO_API`` when the running kernel or architecture > > +cannot support them — for example 32-bit kernels (where ``VM_UFFD_RWP`` is > > +unavailable), kernels built without ``CONFIG_USERFAULTFD_RWP``, and > > +architectures whose ptes cannot carry the uffd bit at runtime (e.g. riscv > > +without the ``SVRSW60T59B`` extension). ``UFFDIO_API`` does not fail; > > +unsupported bits are simply absent from ``uffdio_api.features`` on return. > > +VMMs should inspect the returned ``features`` after ``UFFDIO_API`` and fall > > Lets s/VMM/Callers/. > Although RWP is designed for VMMs, it's not limited to them and I expect > other use-cases will be coming along. Okay. -- Kiryl Shutsemau / Kirill A. Shutemov

