On Thu, Mar 26, 2026 at 06:13:37PM -0700, James Houghton wrote:
> On Fri, Mar 6, 2026 at 9:19 AM Mike Rapoport <[email protected]> wrote:
> >
> > From: "Mike Rapoport (Microsoft)" <[email protected]>
> >
> > Add filemap_add() and filemap_remove() methods to vm_uffd_ops and use
> > them in __mfill_atomic_pte() to add shmem folios to page cache and
> > remove them in case of error.
> >
> > Implement these methods in shmem along with vm_uffd_ops->alloc_folio()
> > and drop shmem_mfill_atomic_pte().
> >
> > Since userfaultfd now does not reference any functions from shmem, drop
> > include if linux/shmem_fs.h from mm/userfaultfd.c
> >
> > mfill_atomic_install_pte() is not used anywhere outside of
> > mm/userfaultfd, make it static.
> >
> > Signed-off-by: Mike Rapoport (Microsoft) <[email protected]>
> > ---
> > include/linux/shmem_fs.h | 14 ----
> > include/linux/userfaultfd_k.h | 21 +++--
> > mm/shmem.c | 148 ++++++++++++----------------------
> > mm/userfaultfd.c | 79 +++++++++---------
> > 4 files changed, 106 insertions(+), 156 deletions(-)
> >
> > diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> > index a8273b32e041..1a345142af7d 100644
> > --- a/include/linux/shmem_fs.h
> > +++ b/include/linux/shmem_fs.h
> > @@ -221,20 +221,6 @@ static inline pgoff_t shmem_fallocend(struct inode
> > *inode, pgoff_t eof)
> >
> > extern bool shmem_charge(struct inode *inode, long pages);
> >
> > -#ifdef CONFIG_USERFAULTFD
> > -#ifdef CONFIG_SHMEM
> > -extern int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
> > - struct vm_area_struct *dst_vma,
> > - unsigned long dst_addr,
> > - unsigned long src_addr,
> > - uffd_flags_t flags,
> > - struct folio **foliop);
> > -#else /* !CONFIG_SHMEM */
> > -#define shmem_mfill_atomic_pte(dst_pmd, dst_vma, dst_addr, \
> > - src_addr, flags, foliop) ({ BUG(); 0; })
> > -#endif /* CONFIG_SHMEM */
> > -#endif /* CONFIG_USERFAULTFD */
> > -
> > /*
> > * Used space is stored as unsigned 64-bit value in bytes but
> > * quota core supports only signed 64-bit values so use that
> > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> > index 4d8b879eed91..bf4e595ac914 100644
> > --- a/include/linux/userfaultfd_k.h
> > +++ b/include/linux/userfaultfd_k.h
> > @@ -93,10 +93,24 @@ struct vm_uffd_ops {
> > struct folio *(*get_folio_noalloc)(struct inode *inode, pgoff_t
> > pgoff);
> > /*
> > * Called during resolution of UFFDIO_COPY request.
> > - * Should return allocate a and return folio or NULL if allocation
> > fails.
> > + * Should allocate and return a folio or NULL if allocation
> > + * fails.
> > */
> > struct folio *(*alloc_folio)(struct vm_area_struct *vma,
> > unsigned long addr);
> > + /*
> > + * Called during resolution of UFFDIO_COPY request.
> > + * Should lock the folio and add it to VMA's page cache.
>
> I don't think "should lock the folio" is accurate. That sounds like
> "it will call folio_lock()" but it actually calls
> __folio_set_locked(). Maybe this is better:
>
> "Should only be called with a folio returned by alloc_folio() above.
> The folio will set to locked."
Yeah, sounds good.
> > + * Returns 0 on success, error code on failure.
> > + */
> > + int (*filemap_add)(struct folio *folio, struct vm_area_struct *vma,
> > + unsigned long addr);
> > @@ -404,6 +400,9 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd,
> >
> > set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
> >
> > + if (page_in_cache)
> > + folio_unlock(folio);
>
> I don't really like doing the folio_unlock() *here*, I think it's
> clearer if the callers (mfill_atomic_pte_continue() and
> __mfill_atomic_pte()) unlocked the folio themselves. But that's just
> my opinion.
We already have page_in_cache here, so I'd prefer to keep a single
folio_unlock() rather than add additional if (page_in_cache) in
__mfill_atomic_pte().
> > +
> > /* No need to invalidate - it was non-present before */
> > update_mmu_cache(dst_vma, dst_addr, dst_pte);
> > ret = 0;
> > @@ -836,41 +856,18 @@ extern ssize_t mfill_atomic_hugetlb(struct
> > userfaultfd_ctx *ctx,
> >
> > static __always_inline ssize_t mfill_atomic_pte(struct mfill_state *state)
> > {
> > - struct vm_area_struct *dst_vma = state->vma;
> > - unsigned long src_addr = state->src_addr;
> > - unsigned long dst_addr = state->dst_addr;
> > - struct folio **foliop = &state->folio;
> > uffd_flags_t flags = state->flags;
> > - pmd_t *dst_pmd = state->pmd;
> > - ssize_t err;
> >
> > if (uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE))
> > return mfill_atomic_pte_continue(state);
> > if (uffd_flags_mode_is(flags, MFILL_ATOMIC_POISON))
> > return mfill_atomic_pte_poison(state);
> > + if (uffd_flags_mode_is(flags, MFILL_ATOMIC_COPY))
> > + return mfill_atomic_pte_copy(state);
> > + if (uffd_flags_mode_is(flags, MFILL_ATOMIC_ZEROPAGE))
> > + return mfill_atomic_pte_zeropage(state);
>
> Thanks for this cleanup. :)
>
> >
> > - /*
> > - * The normal page fault path for a shmem will invoke the
> > - * fault, fill the hole in the file and COW it right away. The
> > - * result generates plain anonymous memory. So when we are
> > - * asked to fill an hole in a MAP_PRIVATE shmem mapping, we'll
> > - * generate anonymous memory directly without actually filling
> > - * the hole. For the MAP_PRIVATE case the robustness check
> > - * only happens in the pagetable (to verify it's still none)
> > - * and not in the radix tree.
> > - */
> > - if (!(dst_vma->vm_flags & VM_SHARED)) {
> > - if (uffd_flags_mode_is(flags, MFILL_ATOMIC_COPY))
> > - err = mfill_atomic_pte_copy(state);
> > - else
> > - err = mfill_atomic_pte_zeropage(state);
> > - } else {
> > - err = shmem_mfill_atomic_pte(dst_pmd, dst_vma,
> > - dst_addr, src_addr,
> > - flags, foliop);
> > - }
> > -
> > - return err;
> > + return -EOPNOTSUPP;
>
> WARN_ONCE() here I think.
I'll add VM_WARN_ONCE() here.
> Feel free to add:
>
> Reviewed-by: James Houghton <[email protected]>
Thanks!
--
Sincerely yours,
Mike.