On Mon, Jan 26, 2026 at 04:56:42PM -0800, Samuel Wu wrote:
> On Mon, Nov 17, 2025 at 9:15 PM Al Viro <[email protected]> wrote:
> >
> > Some filesystems use a kinda-sorta controlled dentry refcount leak to pin
> > dentries of created objects in dcache (and undo it when removing those).
> > Reference is grabbed and not released, but it's not actually _stored_
> > anywhere. That works, but it's hard to follow and verify; among other
> > things, we have no way to tell _which_ of the increments is intended
> > to be an unpaired one. Worse, on removal we need to decide whether
> > the reference had already been dropped, which can be non-trivial if
> > that removal is on umount and we need to figure out if this dentry is
> > pinned due to e.g. unlink() not done. Usually that is handled by using
> > kill_litter_super() as ->kill_sb(), but there are open-coded special
> > cases of the same (consider e.g. /proc/self).
> >
> > Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT)
> > marking those "leaked" dentries. Having it set claims responsibility
> > for +1 in refcount.
> >
> > The end result this series is aiming for:
> >
> > * get these unbalanced dget() and dput() replaced with new primitives that
> > would, in addition to adjusting refcount, set and clear persistency flag.
> > * instead of having kill_litter_super() mess with removing the remaining
> > "leaked" references (e.g. for all tmpfs files that hadn't been removed
> > prior to umount), have the regular shrink_dcache_for_umount() strip
> > DCACHE_PERSISTENT of all dentries, dropping the corresponding
> > reference if it had been set. After that kill_litter_super() becomes
> > an equivalent of kill_anon_super().
> >
> > Doing that in a single step is not feasible - it would affect too many
> > places
> > in too many filesystems. It has to be split into a series.
> >
> > This work has really started early in 2024; quite a few preliminary pieces
> > have already gone into mainline. This chunk is finally getting to the
> > meat of that stuff - infrastructure and most of the conversions to it.
> >
> > Some pieces are still sitting in the local branches, but the bulk of
> > that stuff is here.
> >
> > Compared to v3:
> > * fixed a functionfs braino around ffs_epfiles_destroy() (in #40/54,
> > used to be #36/50).
> > * added fixes for a couple of UAF in functionfs (##36--39); that
> > does *NOT* include any fixes for dmabuf bugs Chris posted last week, though.
> >
> > The branch is -rc5-based; it lives in
> > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.persistency
> > individual patches in followups.
> >
> > Please, help with review and testing. If nobody objects, in a few days it
> > goes into #for-next.
> >
> > Shortlog:
> > fuse_ctl_add_conn(): fix nlink breakage in case of early failure
> > tracefs: fix a leak in eventfs_create_events_dir()
> > new helper: simple_remove_by_name()
> > new helper: simple_done_creating()
> > introduce a flag for explicitly marking persistently pinned dentries
> > primitives for maintaining persisitency
> > convert simple_{link,unlink,rmdir,rename,fill_super}() to new
> > primitives
> > convert ramfs and tmpfs
> > procfs: make /self and /thread_self dentries persistent
> > configfs, securityfs: kill_litter_super() not needed
> > convert xenfs
> > convert smackfs
> > convert hugetlbfs
> > convert mqueue
> > convert bpf
> > convert dlmfs
> > convert fuse_ctl
> > convert pstore
> > convert tracefs
> > convert debugfs
> > debugfs: remove duplicate checks in callers of start_creating()
> > convert efivarfs
> > convert spufs
> > convert ibmasmfs
> > ibmasmfs: get rid of ibmasmfs_dir_ops
> > convert devpts
> > binderfs: use simple_start_creating()
> > binderfs_binder_ctl_create(): kill a bogus check
> > convert binderfs
> > autofs_{rmdir,unlink}: dentry->d_fsdata->dentry == dentry there
> > convert autofs
> > convert binfmt_misc
> > selinuxfs: don't stash the dentry of /policy_capabilities
> > selinuxfs: new helper for attaching files to tree
> > convert selinuxfs
> > functionfs: don't abuse ffs_data_closed() on fs shutdown
> > functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}()
> > functionfs: need to cancel ->reset_work in ->kill_sb()
> > functionfs: fix the open/removal races
> > functionfs: switch to simple_remove_by_name()
> > convert functionfs
> > gadgetfs: switch to simple_remove_by_name()
> > convert gadgetfs
> > hypfs: don't pin dentries twice
> > hypfs: switch hypfs_create_str() to returning int
> > hypfs: swich hypfs_create_u64() to returning int
> > convert hypfs
> > convert rpc_pipefs
> > convert nfsctl
> > convert rust_binderfs
> > get rid of kill_litter_super()
> > convert securityfs
> > kill securityfs_recursive_remove()
> > d_make_discardable(): warn if given a non-persistent dentry
> >
> > Diffstat:
> > Documentation/filesystems/porting.rst | 7 ++
> > arch/powerpc/platforms/cell/spufs/inode.c | 17 ++-
> > arch/s390/hypfs/hypfs.h | 6 +-
> > arch/s390/hypfs/hypfs_diag_fs.c | 60 ++++------
> > arch/s390/hypfs/hypfs_vm_fs.c | 21 ++--
> > arch/s390/hypfs/inode.c | 82 +++++--------
> > drivers/android/binder/rust_binderfs.c | 121 ++++++-------------
> > drivers/android/binderfs.c | 82 +++----------
> > drivers/base/devtmpfs.c | 2 +-
> > drivers/misc/ibmasm/ibmasmfs.c | 24 ++--
> > drivers/usb/gadget/function/f_fs.c | 144 +++++++++++++----------
> > drivers/usb/gadget/legacy/inode.c | 49 ++++----
> > drivers/xen/xenfs/super.c | 2 +-
> > fs/autofs/inode.c | 2 +-
> > fs/autofs/root.c | 11 +-
> > fs/binfmt_misc.c | 69 ++++++-----
> > fs/configfs/dir.c | 10 +-
> > fs/configfs/inode.c | 3 +-
> > fs/configfs/mount.c | 2 +-
> > fs/dcache.c | 111 +++++++++++-------
> > fs/debugfs/inode.c | 32 ++----
> > fs/devpts/inode.c | 57 ++++-----
> > fs/efivarfs/inode.c | 7 +-
> > fs/efivarfs/super.c | 5 +-
> > fs/fuse/control.c | 38 +++---
> > fs/hugetlbfs/inode.c | 12 +-
> > fs/internal.h | 1 -
> > fs/libfs.c | 52 +++++++--
> > fs/nfsd/nfsctl.c | 18 +--
> > fs/ocfs2/dlmfs/dlmfs.c | 8 +-
> > fs/proc/base.c | 6 +-
> > fs/proc/internal.h | 1 +
> > fs/proc/root.c | 14 +--
> > fs/proc/self.c | 10 +-
> > fs/proc/thread_self.c | 11 +-
> > fs/pstore/inode.c | 7 +-
> > fs/ramfs/inode.c | 8 +-
> > fs/super.c | 8 --
> > fs/tracefs/event_inode.c | 7 +-
> > fs/tracefs/inode.c | 13 +--
> > include/linux/dcache.h | 4 +-
> > include/linux/fs.h | 6 +-
> > include/linux/proc_fs.h | 2 -
> > include/linux/security.h | 2 -
> > init/do_mounts.c | 2 +-
> > ipc/mqueue.c | 12 +-
> > kernel/bpf/inode.c | 15 +--
> > mm/shmem.c | 38 ++----
> > net/sunrpc/rpc_pipe.c | 27 ++---
> > security/apparmor/apparmorfs.c | 13 ++-
> > security/inode.c | 35 +++---
> > security/selinux/selinuxfs.c | 185
> > +++++++++++++-----------------
> > security/smack/smackfs.c | 2 +-
> > 53 files changed, 649 insertions(+), 834 deletions(-)
> >
> > Overview:
> >
> > First two commits are bugfixes (fusectl and tracefs resp.)
> >
> > [1/54] fuse_ctl_add_conn(): fix nlink breakage in case of early failure
> > [2/54] tracefs: fix a leak in eventfs_create_events_dir()
> >
> > Next, two commits adding a couple of useful helpers, the next three adding
> > the infrastructure and the rest consists of per-filesystem conversions.
> >
> > [3/54] new helper: simple_remove_by_name()
> > [4/54] new helper: simple_done_creating()
> > end_creating_path() analogue for internal object creation; unlike
> > end_creating_path() no mount is passed to it (or guaranteed to exist, for
> > that matter - it might be used during the filesystem setup, before the
> > superblock gets attached to any mounts).
> >
> > Infrastructure:
> > [5/54] introduce a flag for explicitly marking persistently pinned dentries
> > * introduce the new flag
> > * teach shrink_dcache_for_umount() to handle it (i.e. remove
> > and drop refcount on anything that survives to umount with that flag
> > still set)
> > * teach kill_litter_super() that anything with that flag does
> > *not* need to be unpinned.
> > [6/54] primitives for maintaining persisitency
> > * d_make_persistent(dentry, inode) - bump refcount, mark persistent
> > and make hashed positive. Return value is a borrowed reference to dentry;
> > it can be used until something removes persistency (at the very least,
> > until the parent gets unlocked, but some filesystems may have stronger
> > exclusion).
> > * d_make_discardable() - remove persistency mark and drop reference.
> >
> > NOTE: at that stage d_make_discardable() does not reject dentries not
> > marked persistent - it acts as if the mark been set.
> >
> > Rationale: less noise in series splitup that way. We want (and on the
> > next commit will get) simple_unlink() to do the right thing - remove
> > persistency, if it's there. However, it's used by many filesystems.
> > We would have either to convert them all at once or split simple_unlink()
> > into "want persistent" and "don't want persistent" versions, the latter
> > being the old one. In the course of the series almost all callers
> > would migrate to the replacement, leaving only two pathological cases
> > with the old one. The same goes for simple_rmdir() (two callers left in
> > the end), simple_recursive_removal() (all callers gone in the end), etc.
> > That's a lot of noise and it's easier to start with d_make_discardable()
> > quietly accepting non-persistent dentries, then, in the end, add private
> > copies of simple_unlink() and simple_rmdir() for two weird users (configfs
> > and apparmorfs) and have those use dput() instead of d_make_discardable().
> > At that point we'd be left with all callers of d_make_discardable()
> > always passing persistent dentries, allowing to add a warning in it.
> >
> > [7/54] convert simple_{link,unlink,rmdir,rename,fill_super}() to new
> > primitives
> > See above re quietly accepting non-peristent dentries in
> > simple_unlink(), simple_rmdir(), etc.
> >
> > Converting filesystems:
> > [8/54] convert ramfs and tmpfs
> > [9/54] procfs: make /self and /thread_self dentries persistent
> > [10/54] configfs, securityfs: kill_litter_super() not needed
> > [11/54] convert xenfs
> > [12/54] convert smackfs
> > [13/54] convert hugetlbfs
> > [14/54] convert mqueue
> > [15/54] convert bpf
> > [16/54] convert dlmfs
> > [17/54] convert fuse_ctl
> > [18/54] convert pstore
> > [19/54] convert tracefs
> > [20/54] convert debugfs
> > [21/54] debugfs: remove duplicate checks in callers of start_creating()
> > [22/54] convert efivarfs
> > [23/54] convert spufs
> > [24/54] convert ibmasmfs
> > [25/54] ibmasmfs: get rid of ibmasmfs_dir_ops
> > [26/54] convert devpts
> > [27/54] binderfs: use simple_start_creating()
> > [28/54] binderfs_binder_ctl_create(): kill a bogus check
> > [29/54] convert binderfs
> > [30/54] autofs_{rmdir,unlink}: dentry->d_fsdata->dentry == dentry there
> > [31/54] convert autofs
> > [32/54] convert binfmt_misc
> > [33/54] selinuxfs: don't stash the dentry of /policy_capabilities
> > [34/54] selinuxfs: new helper for attaching files to tree
> > [35/54] convert selinuxfs
> >
> > Several functionfs fixes, before converting it, to make life
> > simpler for backporting:
> > [36/54] functionfs: don't abuse ffs_data_closed() on fs shutdown
> > [37/54] functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}()
> > [38/54] functionfs: need to cancel ->reset_work in ->kill_sb()
> > [39/54] functionfs: fix the open/removal races
> >
> > ... and back to filesystems conversions:
> >
> > [40/54] functionfs: switch to simple_remove_by_name()
> > [41/54] convert functionfs
> > [42/54] gadgetfs: switch to simple_remove_by_name()
> > [43/54] convert gadgetfs
> > [44/54] hypfs: don't pin dentries twice
> > [45/54] hypfs: switch hypfs_create_str() to returning int
> > [46/54] hypfs: swich hypfs_create_u64() to returning int
> > [47/54] convert hypfs
> > [48/54] convert rpc_pipefs
> > [49/54] convert nfsctl
> > [50/54] convert rust_binderfs
> >
> > ... and no kill_litter_super() callers remain, so we
> > can take it out:
> > [51/54] get rid of kill_litter_super()
> >
> > Followups:
> > [52/54] convert securityfs
> > That was the last remaining user of simple_recursive_removal()
> > that did *not* mark things persistent. Now the only places where
> > d_make_discardable() is still called for dentries that are not marked
> > persistent are the calls of simple_{unlink,rmdir}() in configfs and
> > apparmorfs.
> >
> > [53/54] kill securityfs_recursive_remove()
> > Unused macro...
> >
> > [54/54] d_make_discardable(): warn if given a non-persistent dentry
> >
> > At this point there are very few call chains that might lead to
> > d_make_discardable() on a dentry that hadn't been made persistent:
> > calls of simple_unlink() and simple_rmdir() in configfs and
> > apparmorfs.
> >
> > Both filesystems do pin (part of) their contents in dcache, but
> > they are currently playing very unusual games with that. Converting
> > them to more usual patterns might be possible, but it's definitely
> > going to be a long series of changes in both cases.
> >
> > For now the easiest solution is to have both stop using simple_unlink()
> > and simple_rmdir() - that allows to make d_make_discardable() warn
> > when given a non-persistent dentry.
> >
> > Rather than giving them full-blown private copies (with calls of
> > d_make_discardable() replaced with dput()), let's pull the parts of
> > simple_unlink() and simple_rmdir() that deal with timestamps and link
> > counts into separate helpers (__simple_unlink() and __simple_rmdir()
> > resp.) and have those used by configfs and apparmorfs.
> >
>
> Hi Al, when I apply this patchset my Pixel 6 no longer enumerates on
> lsusb or ADB. It was quite hard to bisect to this point, as this is
> non-deterministic and seems to be setup specific. Note, I am using
> android-mainline, but my understanding is that this build does not
> have any out-of-tree USB patches, and that there are no vendor hooks
> in the build.
>
> My apologies as I can't offer any other clues; there are no obviously
> bad dmesg logs and I'm still working on narrowing down the exact
> commit(s) that started this, but just wanted to send a FYI in case
> something stands out as obvious.
Note that I had to revert commit e5bf5ee26663 ("functionfs: fix the
open/removal races") from the stable backports, as it was causing issues
on the pixel devices it got backported to. So perhaps look there?
thanks,
greg k-h