On Mon, Jan 26, 2026 at 04:56:42PM -0800, Samuel Wu wrote:
> On Mon, Nov 17, 2025 at 9:15 PM Al Viro <[email protected]> wrote:
> >
> > Some filesystems use a kinda-sorta controlled dentry refcount leak to pin
> > dentries of created objects in dcache (and undo it when removing those).
> > Reference is grabbed and not released, but it's not actually _stored_
> > anywhere.  That works, but it's hard to follow and verify; among other
> > things, we have no way to tell _which_ of the increments is intended
> > to be an unpaired one.  Worse, on removal we need to decide whether
> > the reference had already been dropped, which can be non-trivial if
> > that removal is on umount and we need to figure out if this dentry is
> > pinned due to e.g. unlink() not done.  Usually that is handled by using
> > kill_litter_super() as ->kill_sb(), but there are open-coded special
> > cases of the same (consider e.g. /proc/self).
> >
> > Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT)
> > marking those "leaked" dentries.  Having it set claims responsibility
> > for +1 in refcount.
> >
> > The end result this series is aiming for:
> >
> > * get these unbalanced dget() and dput() replaced with new primitives that
> >   would, in addition to adjusting refcount, set and clear persistency flag.
> > * instead of having kill_litter_super() mess with removing the remaining
> >   "leaked" references (e.g. for all tmpfs files that hadn't been removed
> >   prior to umount), have the regular shrink_dcache_for_umount() strip
> >   DCACHE_PERSISTENT of all dentries, dropping the corresponding
> >   reference if it had been set.  After that kill_litter_super() becomes
> >   an equivalent of kill_anon_super().
> >
> > Doing that in a single step is not feasible - it would affect too many 
> > places
> > in too many filesystems.  It has to be split into a series.
> >
> > This work has really started early in 2024; quite a few preliminary pieces
> > have already gone into mainline.  This chunk is finally getting to the
> > meat of that stuff - infrastructure and most of the conversions to it.
> >
> > Some pieces are still sitting in the local branches, but the bulk of
> > that stuff is here.
> >
> > Compared to v3:
> >         * fixed a functionfs braino around ffs_epfiles_destroy() (in #40/54,
> > used to be #36/50).
> >         * added fixes for a couple of UAF in functionfs (##36--39); that
> > does *NOT* include any fixes for dmabuf bugs Chris posted last week, though.
> >
> > The branch is -rc5-based; it lives in
> > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.persistency
> > individual patches in followups.
> >
> > Please, help with review and testing.  If nobody objects, in a few days it
> > goes into #for-next.
> >
> > Shortlog:
> >       fuse_ctl_add_conn(): fix nlink breakage in case of early failure
> >       tracefs: fix a leak in eventfs_create_events_dir()
> >       new helper: simple_remove_by_name()
> >       new helper: simple_done_creating()
> >       introduce a flag for explicitly marking persistently pinned dentries
> >       primitives for maintaining persisitency
> >       convert simple_{link,unlink,rmdir,rename,fill_super}() to new 
> > primitives
> >       convert ramfs and tmpfs
> >       procfs: make /self and /thread_self dentries persistent
> >       configfs, securityfs: kill_litter_super() not needed
> >       convert xenfs
> >       convert smackfs
> >       convert hugetlbfs
> >       convert mqueue
> >       convert bpf
> >       convert dlmfs
> >       convert fuse_ctl
> >       convert pstore
> >       convert tracefs
> >       convert debugfs
> >       debugfs: remove duplicate checks in callers of start_creating()
> >       convert efivarfs
> >       convert spufs
> >       convert ibmasmfs
> >       ibmasmfs: get rid of ibmasmfs_dir_ops
> >       convert devpts
> >       binderfs: use simple_start_creating()
> >       binderfs_binder_ctl_create(): kill a bogus check
> >       convert binderfs
> >       autofs_{rmdir,unlink}: dentry->d_fsdata->dentry == dentry there
> >       convert autofs
> >       convert binfmt_misc
> >       selinuxfs: don't stash the dentry of /policy_capabilities
> >       selinuxfs: new helper for attaching files to tree
> >       convert selinuxfs
> >       functionfs: don't abuse ffs_data_closed() on fs shutdown
> >       functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}()
> >       functionfs: need to cancel ->reset_work in ->kill_sb()
> >       functionfs: fix the open/removal races
> >       functionfs: switch to simple_remove_by_name()
> >       convert functionfs
> >       gadgetfs: switch to simple_remove_by_name()
> >       convert gadgetfs
> >       hypfs: don't pin dentries twice
> >       hypfs: switch hypfs_create_str() to returning int
> >       hypfs: swich hypfs_create_u64() to returning int
> >       convert hypfs
> >       convert rpc_pipefs
> >       convert nfsctl
> >       convert rust_binderfs
> >       get rid of kill_litter_super()
> >       convert securityfs
> >       kill securityfs_recursive_remove()
> >       d_make_discardable(): warn if given a non-persistent dentry
> >
> > Diffstat:
> >  Documentation/filesystems/porting.rst     |   7 ++
> >  arch/powerpc/platforms/cell/spufs/inode.c |  17 ++-
> >  arch/s390/hypfs/hypfs.h                   |   6 +-
> >  arch/s390/hypfs/hypfs_diag_fs.c           |  60 ++++------
> >  arch/s390/hypfs/hypfs_vm_fs.c             |  21 ++--
> >  arch/s390/hypfs/inode.c                   |  82 +++++--------
> >  drivers/android/binder/rust_binderfs.c    | 121 ++++++-------------
> >  drivers/android/binderfs.c                |  82 +++----------
> >  drivers/base/devtmpfs.c                   |   2 +-
> >  drivers/misc/ibmasm/ibmasmfs.c            |  24 ++--
> >  drivers/usb/gadget/function/f_fs.c        | 144 +++++++++++++----------
> >  drivers/usb/gadget/legacy/inode.c         |  49 ++++----
> >  drivers/xen/xenfs/super.c                 |   2 +-
> >  fs/autofs/inode.c                         |   2 +-
> >  fs/autofs/root.c                          |  11 +-
> >  fs/binfmt_misc.c                          |  69 ++++++-----
> >  fs/configfs/dir.c                         |  10 +-
> >  fs/configfs/inode.c                       |   3 +-
> >  fs/configfs/mount.c                       |   2 +-
> >  fs/dcache.c                               | 111 +++++++++++-------
> >  fs/debugfs/inode.c                        |  32 ++----
> >  fs/devpts/inode.c                         |  57 ++++-----
> >  fs/efivarfs/inode.c                       |   7 +-
> >  fs/efivarfs/super.c                       |   5 +-
> >  fs/fuse/control.c                         |  38 +++---
> >  fs/hugetlbfs/inode.c                      |  12 +-
> >  fs/internal.h                             |   1 -
> >  fs/libfs.c                                |  52 +++++++--
> >  fs/nfsd/nfsctl.c                          |  18 +--
> >  fs/ocfs2/dlmfs/dlmfs.c                    |   8 +-
> >  fs/proc/base.c                            |   6 +-
> >  fs/proc/internal.h                        |   1 +
> >  fs/proc/root.c                            |  14 +--
> >  fs/proc/self.c                            |  10 +-
> >  fs/proc/thread_self.c                     |  11 +-
> >  fs/pstore/inode.c                         |   7 +-
> >  fs/ramfs/inode.c                          |   8 +-
> >  fs/super.c                                |   8 --
> >  fs/tracefs/event_inode.c                  |   7 +-
> >  fs/tracefs/inode.c                        |  13 +--
> >  include/linux/dcache.h                    |   4 +-
> >  include/linux/fs.h                        |   6 +-
> >  include/linux/proc_fs.h                   |   2 -
> >  include/linux/security.h                  |   2 -
> >  init/do_mounts.c                          |   2 +-
> >  ipc/mqueue.c                              |  12 +-
> >  kernel/bpf/inode.c                        |  15 +--
> >  mm/shmem.c                                |  38 ++----
> >  net/sunrpc/rpc_pipe.c                     |  27 ++---
> >  security/apparmor/apparmorfs.c            |  13 ++-
> >  security/inode.c                          |  35 +++---
> >  security/selinux/selinuxfs.c              | 185 
> > +++++++++++++-----------------
> >  security/smack/smackfs.c                  |   2 +-
> >  53 files changed, 649 insertions(+), 834 deletions(-)
> >
> >         Overview:
> >
> > First two commits are bugfixes (fusectl and tracefs resp.)
> >
> > [1/54] fuse_ctl_add_conn(): fix nlink breakage in case of early failure
> > [2/54] tracefs: fix a leak in eventfs_create_events_dir()
> >
> > Next, two commits adding a couple of useful helpers, the next three adding
> > the infrastructure and the rest consists of per-filesystem conversions.
> >
> > [3/54] new helper: simple_remove_by_name()
> > [4/54] new helper: simple_done_creating()
> >         end_creating_path() analogue for internal object creation; unlike
> > end_creating_path() no mount is passed to it (or guaranteed to exist, for
> > that matter - it might be used during the filesystem setup, before the
> > superblock gets attached to any mounts).
> >
> > Infrastructure:
> > [5/54] introduce a flag for explicitly marking persistently pinned dentries
> >         * introduce the new flag
> >         * teach shrink_dcache_for_umount() to handle it (i.e. remove
> > and drop refcount on anything that survives to umount with that flag
> > still set)
> >         * teach kill_litter_super() that anything with that flag does
> > *not* need to be unpinned.
> > [6/54] primitives for maintaining persisitency
> >         * d_make_persistent(dentry, inode) - bump refcount, mark persistent
> > and make hashed positive.  Return value is a borrowed reference to dentry;
> > it can be used until something removes persistency (at the very least,
> > until the parent gets unlocked, but some filesystems may have stronger
> > exclusion).
> >         * d_make_discardable() - remove persistency mark and drop reference.
> >
> > NOTE: at that stage d_make_discardable() does not reject dentries not
> > marked persistent - it acts as if the mark been set.
> >
> > Rationale: less noise in series splitup that way.  We want (and on the
> > next commit will get) simple_unlink() to do the right thing - remove
> > persistency, if it's there.  However, it's used by many filesystems.
> > We would have either to convert them all at once or split simple_unlink()
> > into "want persistent" and "don't want persistent" versions, the latter
> > being the old one.  In the course of the series almost all callers
> > would migrate to the replacement, leaving only two pathological cases
> > with the old one.  The same goes for simple_rmdir() (two callers left in
> > the end), simple_recursive_removal() (all callers gone in the end), etc.
> > That's a lot of noise and it's easier to start with d_make_discardable()
> > quietly accepting non-persistent dentries, then, in the end, add private
> > copies of simple_unlink() and simple_rmdir() for two weird users (configfs
> > and apparmorfs) and have those use dput() instead of d_make_discardable().
> > At that point we'd be left with all callers of d_make_discardable()
> > always passing persistent dentries, allowing to add a warning in it.
> >
> > [7/54] convert simple_{link,unlink,rmdir,rename,fill_super}() to new 
> > primitives
> >         See above re quietly accepting non-peristent dentries in
> > simple_unlink(), simple_rmdir(), etc.
> >
> >         Converting filesystems:
> > [8/54] convert ramfs and tmpfs
> > [9/54] procfs: make /self and /thread_self dentries persistent
> > [10/54] configfs, securityfs: kill_litter_super() not needed
> > [11/54] convert xenfs
> > [12/54] convert smackfs
> > [13/54] convert hugetlbfs
> > [14/54] convert mqueue
> > [15/54] convert bpf
> > [16/54] convert dlmfs
> > [17/54] convert fuse_ctl
> > [18/54] convert pstore
> > [19/54] convert tracefs
> > [20/54] convert debugfs
> > [21/54] debugfs: remove duplicate checks in callers of start_creating()
> > [22/54] convert efivarfs
> > [23/54] convert spufs
> > [24/54] convert ibmasmfs
> > [25/54] ibmasmfs: get rid of ibmasmfs_dir_ops
> > [26/54] convert devpts
> > [27/54] binderfs: use simple_start_creating()
> > [28/54] binderfs_binder_ctl_create(): kill a bogus check
> > [29/54] convert binderfs
> > [30/54] autofs_{rmdir,unlink}: dentry->d_fsdata->dentry == dentry there
> > [31/54] convert autofs
> > [32/54] convert binfmt_misc
> > [33/54] selinuxfs: don't stash the dentry of /policy_capabilities
> > [34/54] selinuxfs: new helper for attaching files to tree
> > [35/54] convert selinuxfs
> >
> >         Several functionfs fixes, before converting it, to make life
> > simpler for backporting:
> > [36/54] functionfs: don't abuse ffs_data_closed() on fs shutdown
> > [37/54] functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}()
> > [38/54] functionfs: need to cancel ->reset_work in ->kill_sb()
> > [39/54] functionfs: fix the open/removal races
> >
> >         ... and back to filesystems conversions:
> >
> > [40/54] functionfs: switch to simple_remove_by_name()
> > [41/54] convert functionfs
> > [42/54] gadgetfs: switch to simple_remove_by_name()
> > [43/54] convert gadgetfs
> > [44/54] hypfs: don't pin dentries twice
> > [45/54] hypfs: switch hypfs_create_str() to returning int
> > [46/54] hypfs: swich hypfs_create_u64() to returning int
> > [47/54] convert hypfs
> > [48/54] convert rpc_pipefs
> > [49/54] convert nfsctl
> > [50/54] convert rust_binderfs
> >
> >         ... and no kill_litter_super() callers remain, so we
> > can take it out:
> > [51/54] get rid of kill_litter_super()
> >
> >         Followups:
> > [52/54] convert securityfs
> >         That was the last remaining user of simple_recursive_removal()
> > that did *not* mark things persistent.  Now the only places where
> > d_make_discardable() is still called for dentries that are not marked
> > persistent are the calls of simple_{unlink,rmdir}() in configfs and
> > apparmorfs.
> >
> > [53/54] kill securityfs_recursive_remove()
> >         Unused macro...
> >
> > [54/54] d_make_discardable(): warn if given a non-persistent dentry
> >
> > At this point there are very few call chains that might lead to
> > d_make_discardable() on a dentry that hadn't been made persistent:
> > calls of simple_unlink() and simple_rmdir() in configfs and
> > apparmorfs.
> >
> > Both filesystems do pin (part of) their contents in dcache, but
> > they are currently playing very unusual games with that.  Converting
> > them to more usual patterns might be possible, but it's definitely
> > going to be a long series of changes in both cases.
> >
> > For now the easiest solution is to have both stop using simple_unlink()
> > and simple_rmdir() - that allows to make d_make_discardable() warn
> > when given a non-persistent dentry.
> >
> > Rather than giving them full-blown private copies (with calls of
> > d_make_discardable() replaced with dput()), let's pull the parts of
> > simple_unlink() and simple_rmdir() that deal with timestamps and link
> > counts into separate helpers (__simple_unlink() and __simple_rmdir()
> > resp.) and have those used by configfs and apparmorfs.
> >
> 
> Hi Al, when I apply this patchset my Pixel 6 no longer enumerates on
> lsusb or ADB. It was quite hard to bisect to this point, as this is
> non-deterministic and seems to be setup specific. Note, I am using
> android-mainline, but my understanding is that this build does not
> have any out-of-tree USB patches, and that there are no vendor hooks
> in the build.
> 
> My apologies as I can't offer any other clues; there are no obviously
> bad dmesg logs and I'm still working on narrowing down the exact
> commit(s) that started this, but just wanted to send a FYI in case
> something stands out as obvious.

Note that I had to revert commit e5bf5ee26663 ("functionfs: fix the
open/removal races") from the stable backports, as it was causing issues
on the pixel devices it got backported to.  So perhaps look there?

thanks,

greg k-h

Reply via email to