On Tue, Feb 03, 2026 at 12:40:26PM -0500, Peter Xu wrote:
> On Tue, Jan 27, 2026 at 09:29:29PM +0200, Mike Rapoport wrote:
> > From: "Mike Rapoport (Microsoft)" <[email protected]>
> > 
> > Add filemap_add() and filemap_remove() methods to vm_uffd_ops and use
> > them in __mfill_atomic_pte() to add shmem folios to page cache and
> > remove them in case of error.
> > 
> > Implement these methods in shmem along with vm_uffd_ops->alloc_folio()
> > and drop shmem_mfill_atomic_pte().
> > 
> > Since userfaultfd now does not reference any functions from shmem, drop
> > include if linux/shmem_fs.h from mm/userfaultfd.c
> > 
> > mfill_atomic_install_pte() is not used anywhere outside of
> > mm/userfaultfd, make it static.
> > 
> > Signed-off-by: Mike Rapoport (Microsoft) <[email protected]>
> 
> This patch looks like a real nice cleanup on its own, thanks Mike!
> 
> I guess I never tried to read into shmem accountings, now after I read some
> of the codes I don't see any issue with your change.  We can also wait for
> some shmem developers double check those.  Comments inline below on
> something I spot.
> 
> > 
> > fixup
> > 
> > Signed-off-by: Mike Rapoport (Microsoft) <[email protected]>
> 
> [unexpected lines can be removed here]

Sure :)
 
> > ---
> >  include/linux/shmem_fs.h      |  14 ----
> >  include/linux/userfaultfd_k.h |  20 +++--
> >  mm/shmem.c                    | 148 ++++++++++++----------------------
> >  mm/userfaultfd.c              |  79 +++++++++---------
> >  4 files changed, 106 insertions(+), 155 deletions(-)
> > 
> > diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> > index e2069b3179c4..754f17e5b53c 100644
> > --- a/include/linux/shmem_fs.h
> > +++ b/include/linux/shmem_fs.h
> > @@ -97,6 +97,21 @@ struct vm_uffd_ops {
> >      */
> >     struct folio *(*alloc_folio)(struct vm_area_struct *vma,
> >                                  unsigned long addr);
> > +   /*
> > +    * Called during resolution of UFFDIO_COPY request.
> > +    * Should lock the folio and add it to VMA's page cache.
> > +    * Returns 0 on success, error code on failre.
> 
> failure

Thanks, will fix.
 
> > +    */
> > +   int (*filemap_add)(struct folio *folio, struct vm_area_struct *vma,
> > +                    unsigned long addr);
> > +   /*
> > +    * Called during resolution of UFFDIO_COPY request on the error
> > +    * handling path.
> > +    * Should revert the operation of ->filemap_add().
> > +    * The folio should be unlocked, but the reference to it should not be
> > +    * dropped.
> 
> Might be slightly misleading to explicitly mention this?  As page cache
> also holds references and IIUC they need to be dropped there.  But I get
> your point, on keeping the last refcount due to allocation.
> 
> IMHO the "should revert the operation of ->filemap_add()" is good enough
> and accurately describes it.

Yeah, sounds good.
 
> > +    */
> > +   void (*filemap_remove)(struct folio *folio, struct vm_area_struct *vma);
> >  };
> >  
> >  /* A combined operation mode + behavior flags. */

...

> > +static int shmem_mfill_filemap_add(struct folio *folio,
> > +                              struct vm_area_struct *vma,
> > +                              unsigned long addr)
> > +{
> > +   struct inode *inode = file_inode(vma->vm_file);
> > +   struct address_space *mapping = inode->i_mapping;
> > +   pgoff_t pgoff = linear_page_index(vma, addr);
> > +   gfp_t gfp = mapping_gfp_mask(mapping);
> > +   int err;
> > +
> >     __folio_set_locked(folio);
> >     __folio_set_swapbacked(folio);
> > -   __folio_mark_uptodate(folio);
> > -
> > -   ret = -EFAULT;
> > -   max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
> > -   if (unlikely(pgoff >= max_off))
> > -           goto out_release;
> >  
> > -   ret = mem_cgroup_charge(folio, dst_vma->vm_mm, gfp);
> > -   if (ret)
> > -           goto out_release;
> > -   ret = shmem_add_to_page_cache(folio, mapping, pgoff, NULL, gfp);
> > -   if (ret)
> > -           goto out_release;
> > +   err = shmem_add_to_page_cache(folio, mapping, pgoff, NULL, gfp);
> > +   if (err)
> > +           goto err_unlock;
> >  
> > -   ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr,
> > -                                  &folio->page, true, flags);
> > -   if (ret)
> > -           goto out_delete_from_cache;
> > +   if (shmem_inode_acct_blocks(inode, 1)) {
> 
> We used to do this early before allocation, IOW, I think we still have an
> option to leave this to alloc_folio() hook.  However I don't see an issue
> either keeping it in filemap_add(). Maybe this movement should better be
> spelled out in the commit message anyway on how this decision is made.
> 
> IIUC it's indeed safe we move this acct_blocks() here, I even see Hugh
> mentioned such in an older commit 3022fd7af96, but Hugh left uffd alone at
> that time:
> 
>     Userfaultfd is a foreign country: they do things differently there, and
>     for good reason - to avoid mmap_lock deadlock.  Leave ordering in
>     shmem_mfill_atomic_pte() untouched for now, but I would rather like to
>     mesh it better with shmem_get_folio_gfp() in the future.
> 
> I'm not sure if that's also what you wanted to do - to make userfaultfd
> code work similarly like what shmem_alloc_and_add_folio() does right now.
> Maybe you want to mention that too somewhere in the commit log when posting
> a formal patch.
> 
> One thing not directly relevant is, shmem_alloc_and_add_folio() also does
> proper recalc of inode allocation info when acct_blocks() fails here.  But
> if that's a problem, that's pre-existing for userfaultfd, so IIUC we can
> also leave it alone until someone (maybe quota user) complains about shmem
> allocation failures on UFFDIO_COPY.. It's just that it looks similar
> problem here in userfaultfd path.

I actually wanted to have ordering as close as possible to
shmem_alloc_and_add_folio(), that's the first reason on moving acct_blocks
to ->filemap_add().
Another reason, is that it simplifies rollback in case of a failure, as
shmem_recalc_inode(inode, 0, 0); in ->filemap_remove() takes care of the
block accounting as well.

> > +           err = -ENOMEM;
> > +           goto err_delete_from_cache;
> > +   }
> >  
> > +   folio_add_lru(folio);
> 
> This change is pretty separate from the work, but looks correct to me: IIUC
> we moved the lru add earlier now, and it should be safe as long as we're
> holding folio lock all through the process, and folio_put() (ultimately,
> __page_cache_release()) will always properly undo the lru change.  Please
> help double check if my understanding is correct.

This follows shmem_alloc_and_add_folio(), and my understanding as well that
this is safe as long as we hold folio lock.

> > +static void shmem_mfill_filemap_remove(struct folio *folio,
> > +                                  struct vm_area_struct *vma)
> > +{
> > +   struct inode *inode = file_inode(vma->vm_file);
> > +
> > +   filemap_remove_folio(folio);
> > +   shmem_recalc_inode(inode, 0, 0);
> >     folio_unlock(folio);
> > -   folio_put(folio);
> > -out_unacct_blocks:
> > -   shmem_inode_unacct_blocks(inode, 1);
> 
> This looks wrong, or maybe I miss somewhere we did the unacct_blocks()?

This is handled by shmem_recalc_inode(inode, 0, 0).
 
> > @@ -401,6 +397,9 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd,
> >  
> >     set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
> >  
> > +   if (page_in_cache)
> > +           folio_unlock(folio);
> 
> Nitpick: another small change that looks correct, but IMHO would be nice to
> either make it a small separate patch, or mention in the commit message.

I'll address this in the commit log, 
 
> > +
> >     /* No need to invalidate - it was non-present before */
> >     update_mmu_cache(dst_vma, dst_addr, dst_pte);
> >     ret = 0;

-- 
Sincerely yours,
Mike.

Reply via email to