On Tue, Feb 03, 2026 at 12:40:26PM -0500, Peter Xu wrote:
> On Tue, Jan 27, 2026 at 09:29:29PM +0200, Mike Rapoport wrote:
> > From: "Mike Rapoport (Microsoft)" <[email protected]>
> >
> > Add filemap_add() and filemap_remove() methods to vm_uffd_ops and use
> > them in __mfill_atomic_pte() to add shmem folios to page cache and
> > remove them in case of error.
> >
> > Implement these methods in shmem along with vm_uffd_ops->alloc_folio()
> > and drop shmem_mfill_atomic_pte().
> >
> > Since userfaultfd now does not reference any functions from shmem, drop
> > include if linux/shmem_fs.h from mm/userfaultfd.c
> >
> > mfill_atomic_install_pte() is not used anywhere outside of
> > mm/userfaultfd, make it static.
> >
> > Signed-off-by: Mike Rapoport (Microsoft) <[email protected]>
>
> This patch looks like a real nice cleanup on its own, thanks Mike!
>
> I guess I never tried to read into shmem accountings, now after I read some
> of the codes I don't see any issue with your change. We can also wait for
> some shmem developers double check those. Comments inline below on
> something I spot.
>
> >
> > fixup
> >
> > Signed-off-by: Mike Rapoport (Microsoft) <[email protected]>
>
> [unexpected lines can be removed here]
Sure :)
> > ---
> > include/linux/shmem_fs.h | 14 ----
> > include/linux/userfaultfd_k.h | 20 +++--
> > mm/shmem.c | 148 ++++++++++++----------------------
> > mm/userfaultfd.c | 79 +++++++++---------
> > 4 files changed, 106 insertions(+), 155 deletions(-)
> >
> > diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> > index e2069b3179c4..754f17e5b53c 100644
> > --- a/include/linux/shmem_fs.h
> > +++ b/include/linux/shmem_fs.h
> > @@ -97,6 +97,21 @@ struct vm_uffd_ops {
> > */
> > struct folio *(*alloc_folio)(struct vm_area_struct *vma,
> > unsigned long addr);
> > + /*
> > + * Called during resolution of UFFDIO_COPY request.
> > + * Should lock the folio and add it to VMA's page cache.
> > + * Returns 0 on success, error code on failre.
>
> failure
Thanks, will fix.
> > + */
> > + int (*filemap_add)(struct folio *folio, struct vm_area_struct *vma,
> > + unsigned long addr);
> > + /*
> > + * Called during resolution of UFFDIO_COPY request on the error
> > + * handling path.
> > + * Should revert the operation of ->filemap_add().
> > + * The folio should be unlocked, but the reference to it should not be
> > + * dropped.
>
> Might be slightly misleading to explicitly mention this? As page cache
> also holds references and IIUC they need to be dropped there. But I get
> your point, on keeping the last refcount due to allocation.
>
> IMHO the "should revert the operation of ->filemap_add()" is good enough
> and accurately describes it.
Yeah, sounds good.
> > + */
> > + void (*filemap_remove)(struct folio *folio, struct vm_area_struct *vma);
> > };
> >
> > /* A combined operation mode + behavior flags. */
...
> > +static int shmem_mfill_filemap_add(struct folio *folio,
> > + struct vm_area_struct *vma,
> > + unsigned long addr)
> > +{
> > + struct inode *inode = file_inode(vma->vm_file);
> > + struct address_space *mapping = inode->i_mapping;
> > + pgoff_t pgoff = linear_page_index(vma, addr);
> > + gfp_t gfp = mapping_gfp_mask(mapping);
> > + int err;
> > +
> > __folio_set_locked(folio);
> > __folio_set_swapbacked(folio);
> > - __folio_mark_uptodate(folio);
> > -
> > - ret = -EFAULT;
> > - max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
> > - if (unlikely(pgoff >= max_off))
> > - goto out_release;
> >
> > - ret = mem_cgroup_charge(folio, dst_vma->vm_mm, gfp);
> > - if (ret)
> > - goto out_release;
> > - ret = shmem_add_to_page_cache(folio, mapping, pgoff, NULL, gfp);
> > - if (ret)
> > - goto out_release;
> > + err = shmem_add_to_page_cache(folio, mapping, pgoff, NULL, gfp);
> > + if (err)
> > + goto err_unlock;
> >
> > - ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr,
> > - &folio->page, true, flags);
> > - if (ret)
> > - goto out_delete_from_cache;
> > + if (shmem_inode_acct_blocks(inode, 1)) {
>
> We used to do this early before allocation, IOW, I think we still have an
> option to leave this to alloc_folio() hook. However I don't see an issue
> either keeping it in filemap_add(). Maybe this movement should better be
> spelled out in the commit message anyway on how this decision is made.
>
> IIUC it's indeed safe we move this acct_blocks() here, I even see Hugh
> mentioned such in an older commit 3022fd7af96, but Hugh left uffd alone at
> that time:
>
> Userfaultfd is a foreign country: they do things differently there, and
> for good reason - to avoid mmap_lock deadlock. Leave ordering in
> shmem_mfill_atomic_pte() untouched for now, but I would rather like to
> mesh it better with shmem_get_folio_gfp() in the future.
>
> I'm not sure if that's also what you wanted to do - to make userfaultfd
> code work similarly like what shmem_alloc_and_add_folio() does right now.
> Maybe you want to mention that too somewhere in the commit log when posting
> a formal patch.
>
> One thing not directly relevant is, shmem_alloc_and_add_folio() also does
> proper recalc of inode allocation info when acct_blocks() fails here. But
> if that's a problem, that's pre-existing for userfaultfd, so IIUC we can
> also leave it alone until someone (maybe quota user) complains about shmem
> allocation failures on UFFDIO_COPY.. It's just that it looks similar
> problem here in userfaultfd path.
I actually wanted to have ordering as close as possible to
shmem_alloc_and_add_folio(), that's the first reason on moving acct_blocks
to ->filemap_add().
Another reason, is that it simplifies rollback in case of a failure, as
shmem_recalc_inode(inode, 0, 0); in ->filemap_remove() takes care of the
block accounting as well.
> > + err = -ENOMEM;
> > + goto err_delete_from_cache;
> > + }
> >
> > + folio_add_lru(folio);
>
> This change is pretty separate from the work, but looks correct to me: IIUC
> we moved the lru add earlier now, and it should be safe as long as we're
> holding folio lock all through the process, and folio_put() (ultimately,
> __page_cache_release()) will always properly undo the lru change. Please
> help double check if my understanding is correct.
This follows shmem_alloc_and_add_folio(), and my understanding as well that
this is safe as long as we hold folio lock.
> > +static void shmem_mfill_filemap_remove(struct folio *folio,
> > + struct vm_area_struct *vma)
> > +{
> > + struct inode *inode = file_inode(vma->vm_file);
> > +
> > + filemap_remove_folio(folio);
> > + shmem_recalc_inode(inode, 0, 0);
> > folio_unlock(folio);
> > - folio_put(folio);
> > -out_unacct_blocks:
> > - shmem_inode_unacct_blocks(inode, 1);
>
> This looks wrong, or maybe I miss somewhere we did the unacct_blocks()?
This is handled by shmem_recalc_inode(inode, 0, 0).
> > @@ -401,6 +397,9 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd,
> >
> > set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
> >
> > + if (page_in_cache)
> > + folio_unlock(folio);
>
> Nitpick: another small change that looks correct, but IMHO would be nice to
> either make it a small separate patch, or mention in the commit message.
I'll address this in the commit log,
> > +
> > /* No need to invalidate - it was non-present before */
> > update_mmu_cache(dst_vma, dst_addr, dst_pte);
> > ret = 0;
--
Sincerely yours,
Mike.