On Sun, Mar 15, 2026 at 04:23:14PM -0700, Suren Baghdasaryan wrote:
> On Thu, Mar 12, 2026 at 1:27 PM Lorenzo Stoakes (Oracle) <[email protected]>
> wrote:
> >
> > This documentation makes it easier for a driver/file system implementer to
> > correctly use this callback.
> >
> > It covers the fundamentals, whilst intentionally leaving the less lovely
> > possible actions one might take undocumented (for instance - the
> > success_hook, error_hook fields in mmap_action).
> >
> > The document also covers the new VMA flags implementation which is the only
> > one which will work correctly with mmap_prepare.
> >
> > Signed-off-by: Lorenzo Stoakes (Oracle) <[email protected]>
> > ---
> > Documentation/filesystems/mmap_prepare.rst | 131 +++++++++++++++++++++
> > 1 file changed, 131 insertions(+)
> > create mode 100644 Documentation/filesystems/mmap_prepare.rst
> >
> > diff --git a/Documentation/filesystems/mmap_prepare.rst
> > b/Documentation/filesystems/mmap_prepare.rst
> > new file mode 100644
> > index 000000000000..76908200f3a1
> > --- /dev/null
> > +++ b/Documentation/filesystems/mmap_prepare.rst
> > @@ -0,0 +1,131 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===========================
> > +mmap_prepare callback HOWTO
> > +===========================
> > +
> > +Introduction
> > +############
> > +
> > +The `struct file->f_op->mmap()` callback has been deprecated as it is both
> > a
> > +stability and security risk, and doesn't always permit the merging of
> > adjacent
> > +mappings resulting in unnecessary memory fragmentation.
> > +
> > +It has been replaced with the `file->f_op->mmap_prepare()` callback which
> > solves
> > +these problems.
> > +
> > +## How To Use
> > +
> > +In your driver's `struct file_operations` struct, specify an `mmap_prepare`
> > +callback rather than an `mmap` one, e.g. for ext4:
> > +
> > +
> > +.. code-block:: C
> > +
> > + const struct file_operations ext4_file_operations = {
> > + ...
> > + .mmap_prepare = ext4_file_mmap_prepare,
> > + };
> > +
> > +This has a signature of `int (*mmap_prepare)(struct vm_area_desc *)`.
> > +
> > +Examining the `struct vm_area_desc` type:
> > +
> > +.. code-block:: C
> > +
> > + struct vm_area_desc {
> > + /* Immutable state. */
> > + const struct mm_struct *const mm;
> > + struct file *const file; /* May vary from vm_file in stacked
> > callers. */
> > + unsigned long start;
> > + unsigned long end;
> > +
> > + /* Mutable fields. Populated with initial state. */
> > + pgoff_t pgoff;
> > + struct file *vm_file;
> > + vma_flags_t vma_flags;
> > + pgprot_t page_prot;
> > +
> > + /* Write-only fields. */
> > + const struct vm_operations_struct *vm_ops;
> > + void *private_data;
> > +
> > + /* Take further action? */
> > + struct mmap_action action;
>
> So, action still belongs to /* Write-only fields. */ section? This is
> nitpicky, but it might be better to have this as:
>
> /* Write-only fields. */
> const struct vm_operations_struct *vm_ops;
> void *private_data;
> struct mmap_action action; /* Take further action? */
Absolutely not. This field is not to be written to by the user.
We sadly have to allow hugetlb to do some hacks, but these are things we don't
want to point out.
Users should use mmap_action_xxx() functions.
>
> > + };
> > +
> > +This is straightforward - you have all the fields you need to set up the
> > +mapping, and you can update the mutable and writable fields, for instance:
> > +
> > +.. code-block:: Cw
> > +
> > + static int ext4_file_mmap_prepare(struct vm_area_desc *desc)
> > + {
> > + int ret;
> > + struct file *file = desc->file;
> > + struct inode *inode = file->f_mapping->host;
> > +
> > + ...
> > +
> > + file_accessed(file);
> > + if (IS_DAX(file_inode(file))) {
> > + desc->vm_ops = &ext4_dax_vm_ops;
> > + vma_desc_set_flags(desc, VMA_HUGEPAGE_BIT);
> > + } else {
> > + desc->vm_ops = &ext4_file_vm_ops;
> > + }
> > + return 0;
> > + }
> > +
> > +Importantly, you no longer have to dance around with reference counts or
> > locks
> > +when updating these fields - __you can simply go ahead and change them__.
> > +
> > +Everything is taken care of by the mapping code.
> > +
> > +VMA Flags
> > +=========
> > +
> > +Along with `mmap_prepare`, VMA flags have undergone an overhaul. Where
> > before
> > +you would invoke one of `vm_flags_init()`, `vm_flags_reset()`,
> > `vm_flags_set()`,
> > +`vm_flags_clear()`, and `vm_flags_mod()` to modify flags (and to have the
> > +locking done correctly for you, this is no longer necessary.
> > +
> > +Also, the legacy approach of specifying VMA flags via `VM_READ`,
> > `VM_WRITE`,
> > +etc. - i.e. using a `VM_xxx` macro has changed too.
> > +
> > +When implementing `mmap_prepare()`, reference flags by their bit number,
> > defined
> > +as a `VMA_xxx_BIT` macro, e.g. `VMA_READ_BIT`, `VMA_WRITE_BIT` etc., and
> > use one
> > +of (where `desc` is a pointer to `struct vma_area_desc`):
> > +
> > +* `vma_desc_test_flags(desc, ...)` - Specify a comma-separated list of
> > flags you
> > + wish to test for (whether _any_ are set), e.g. -
> > `vma_desc_test_flags(desc,
> > + VMA_WRITE_BIT, VMA_MAYWRITE_BIT)` - returns `true` if either are set,
> > + otherwise `false`.
> > +* `vma_desc_set_flags(desc, ...)` - Update the VMA descriptor flags to set
> > + additional flags specified by a comma-separated list,
> > + e.g. - `vma_desc_set_flags(desc, VMA_PFNMAP_BIT, VMA_IO_BIT)`.
> > +* `vma_desc_clear_flags(desc, ...)` - Update the VMA descriptor flags to
> > clear
> > + flags specified by a comma-separated list, e.g. -
> > `vma_desc_clear_flags(desc,
> > + VMA_WRITE_BIT, VMA_MAYWRITE_BIT)`.
> > +
> > +Actions
> > +=======
> > +
> > +You can now very easily have actions be performed upon a mapping once set
> > up by
> > +utilising simple helper functions invoked upon the `struct vm_area_desc`
> > +pointer. These are:
> > +
> > +* `mmap_action_remap()` - Remaps a range consisting only of PFNs for a
> > specific
> > + range starting a virtual address and PFN number of a set size.
> > +
> > +* `mmap_action_remap_full()` - Same as `mmap_action_remap()`, only remaps
> > the
> > + entire mapping from `start_pfn` onward.
> > +
> > +* `mmap_action_ioremap()` - Same as `mmap_action_remap()`, only performs
> > an I/O
> > + remap.
> > +
> > +* `mmap_action_ioremap_full()` - Same as `mmap_action_ioremap()`, only
> > remaps
> > + the entire mapping from `start_pfn` onward.
> > +
> > +**NOTE:** The 'action' field should never normally be manipulated directly,
> > +rather you ought to use one of these helpers.
>
> I'm guessing the start and size parameters passed to
> mmap_action_remap() and such are restricted by vm_area_desc.start
> vm_area_desc.end. If so, should we document those restrictions and
> enforce them in the code?
I mean it's the same restrictions as all of the functions already apply if you
were to use them with a VMA descriptor.
I think implicitly a remap will fail if you try it out of the VMA range at the
point of applying the change.
But it might be worth adding range_in_vma_desc() checks at prepare time, will
see if I can do that for the respin.
I think it's pretty obvious that you shouldn't be trying to remap totally
unrelated memory, so I'm not sure that's at a level of granularity that's suited
to this document though.
>
> > + struct vm_area_desc {
> > + /* Immutable state. */
> > + const struct mm_struct *const mm;
> > + struct file *const file; /* May vary from vm_file in stacked
> > callers. */
> > + unsigned long start;
> > + unsigned long end;
>
>
> > --
> > 2.53.0
> >