On Mon, Nov 17, 2025 at 9:15 PM Al Viro <[email protected]> wrote: > > Some filesystems use a kinda-sorta controlled dentry refcount leak to pin > dentries of created objects in dcache (and undo it when removing those). > Reference is grabbed and not released, but it's not actually _stored_ > anywhere. That works, but it's hard to follow and verify; among other > things, we have no way to tell _which_ of the increments is intended > to be an unpaired one. Worse, on removal we need to decide whether > the reference had already been dropped, which can be non-trivial if > that removal is on umount and we need to figure out if this dentry is > pinned due to e.g. unlink() not done. Usually that is handled by using > kill_litter_super() as ->kill_sb(), but there are open-coded special > cases of the same (consider e.g. /proc/self). > > Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT) > marking those "leaked" dentries. Having it set claims responsibility > for +1 in refcount. > > The end result this series is aiming for: > > * get these unbalanced dget() and dput() replaced with new primitives that > would, in addition to adjusting refcount, set and clear persistency flag. > * instead of having kill_litter_super() mess with removing the remaining > "leaked" references (e.g. for all tmpfs files that hadn't been removed > prior to umount), have the regular shrink_dcache_for_umount() strip > DCACHE_PERSISTENT of all dentries, dropping the corresponding > reference if it had been set. After that kill_litter_super() becomes > an equivalent of kill_anon_super(). > > Doing that in a single step is not feasible - it would affect too many places > in too many filesystems. It has to be split into a series. > > This work has really started early in 2024; quite a few preliminary pieces > have already gone into mainline. This chunk is finally getting to the > meat of that stuff - infrastructure and most of the conversions to it. > > Some pieces are still sitting in the local branches, but the bulk of > that stuff is here. > > Compared to v3: > * fixed a functionfs braino around ffs_epfiles_destroy() (in #40/54, > used to be #36/50). > * added fixes for a couple of UAF in functionfs (##36--39); that > does *NOT* include any fixes for dmabuf bugs Chris posted last week, though. > > The branch is -rc5-based; it lives in > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.persistency > individual patches in followups. > > Please, help with review and testing. If nobody objects, in a few days it > goes into #for-next. > > Shortlog: > fuse_ctl_add_conn(): fix nlink breakage in case of early failure > tracefs: fix a leak in eventfs_create_events_dir() > new helper: simple_remove_by_name() > new helper: simple_done_creating() > introduce a flag for explicitly marking persistently pinned dentries > primitives for maintaining persisitency > convert simple_{link,unlink,rmdir,rename,fill_super}() to new primitives > convert ramfs and tmpfs > procfs: make /self and /thread_self dentries persistent > configfs, securityfs: kill_litter_super() not needed > convert xenfs > convert smackfs > convert hugetlbfs > convert mqueue > convert bpf > convert dlmfs > convert fuse_ctl > convert pstore > convert tracefs > convert debugfs > debugfs: remove duplicate checks in callers of start_creating() > convert efivarfs > convert spufs > convert ibmasmfs > ibmasmfs: get rid of ibmasmfs_dir_ops > convert devpts > binderfs: use simple_start_creating() > binderfs_binder_ctl_create(): kill a bogus check > convert binderfs > autofs_{rmdir,unlink}: dentry->d_fsdata->dentry == dentry there > convert autofs > convert binfmt_misc > selinuxfs: don't stash the dentry of /policy_capabilities > selinuxfs: new helper for attaching files to tree > convert selinuxfs > functionfs: don't abuse ffs_data_closed() on fs shutdown > functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}() > functionfs: need to cancel ->reset_work in ->kill_sb() > functionfs: fix the open/removal races > functionfs: switch to simple_remove_by_name() > convert functionfs > gadgetfs: switch to simple_remove_by_name() > convert gadgetfs > hypfs: don't pin dentries twice > hypfs: switch hypfs_create_str() to returning int > hypfs: swich hypfs_create_u64() to returning int > convert hypfs > convert rpc_pipefs > convert nfsctl > convert rust_binderfs > get rid of kill_litter_super() > convert securityfs > kill securityfs_recursive_remove() > d_make_discardable(): warn if given a non-persistent dentry > > Diffstat: > Documentation/filesystems/porting.rst | 7 ++ > arch/powerpc/platforms/cell/spufs/inode.c | 17 ++- > arch/s390/hypfs/hypfs.h | 6 +- > arch/s390/hypfs/hypfs_diag_fs.c | 60 ++++------ > arch/s390/hypfs/hypfs_vm_fs.c | 21 ++-- > arch/s390/hypfs/inode.c | 82 +++++-------- > drivers/android/binder/rust_binderfs.c | 121 ++++++------------- > drivers/android/binderfs.c | 82 +++---------- > drivers/base/devtmpfs.c | 2 +- > drivers/misc/ibmasm/ibmasmfs.c | 24 ++-- > drivers/usb/gadget/function/f_fs.c | 144 +++++++++++++---------- > drivers/usb/gadget/legacy/inode.c | 49 ++++---- > drivers/xen/xenfs/super.c | 2 +- > fs/autofs/inode.c | 2 +- > fs/autofs/root.c | 11 +- > fs/binfmt_misc.c | 69 ++++++----- > fs/configfs/dir.c | 10 +- > fs/configfs/inode.c | 3 +- > fs/configfs/mount.c | 2 +- > fs/dcache.c | 111 +++++++++++------- > fs/debugfs/inode.c | 32 ++---- > fs/devpts/inode.c | 57 ++++----- > fs/efivarfs/inode.c | 7 +- > fs/efivarfs/super.c | 5 +- > fs/fuse/control.c | 38 +++--- > fs/hugetlbfs/inode.c | 12 +- > fs/internal.h | 1 - > fs/libfs.c | 52 +++++++-- > fs/nfsd/nfsctl.c | 18 +-- > fs/ocfs2/dlmfs/dlmfs.c | 8 +- > fs/proc/base.c | 6 +- > fs/proc/internal.h | 1 + > fs/proc/root.c | 14 +-- > fs/proc/self.c | 10 +- > fs/proc/thread_self.c | 11 +- > fs/pstore/inode.c | 7 +- > fs/ramfs/inode.c | 8 +- > fs/super.c | 8 -- > fs/tracefs/event_inode.c | 7 +- > fs/tracefs/inode.c | 13 +-- > include/linux/dcache.h | 4 +- > include/linux/fs.h | 6 +- > include/linux/proc_fs.h | 2 - > include/linux/security.h | 2 - > init/do_mounts.c | 2 +- > ipc/mqueue.c | 12 +- > kernel/bpf/inode.c | 15 +-- > mm/shmem.c | 38 ++---- > net/sunrpc/rpc_pipe.c | 27 ++--- > security/apparmor/apparmorfs.c | 13 ++- > security/inode.c | 35 +++--- > security/selinux/selinuxfs.c | 185 > +++++++++++++----------------- > security/smack/smackfs.c | 2 +- > 53 files changed, 649 insertions(+), 834 deletions(-) > > Overview: > > First two commits are bugfixes (fusectl and tracefs resp.) > > [1/54] fuse_ctl_add_conn(): fix nlink breakage in case of early failure > [2/54] tracefs: fix a leak in eventfs_create_events_dir() > > Next, two commits adding a couple of useful helpers, the next three adding > the infrastructure and the rest consists of per-filesystem conversions. > > [3/54] new helper: simple_remove_by_name() > [4/54] new helper: simple_done_creating() > end_creating_path() analogue for internal object creation; unlike > end_creating_path() no mount is passed to it (or guaranteed to exist, for > that matter - it might be used during the filesystem setup, before the > superblock gets attached to any mounts). > > Infrastructure: > [5/54] introduce a flag for explicitly marking persistently pinned dentries > * introduce the new flag > * teach shrink_dcache_for_umount() to handle it (i.e. remove > and drop refcount on anything that survives to umount with that flag > still set) > * teach kill_litter_super() that anything with that flag does > *not* need to be unpinned. > [6/54] primitives for maintaining persisitency > * d_make_persistent(dentry, inode) - bump refcount, mark persistent > and make hashed positive. Return value is a borrowed reference to dentry; > it can be used until something removes persistency (at the very least, > until the parent gets unlocked, but some filesystems may have stronger > exclusion). > * d_make_discardable() - remove persistency mark and drop reference. > > NOTE: at that stage d_make_discardable() does not reject dentries not > marked persistent - it acts as if the mark been set. > > Rationale: less noise in series splitup that way. We want (and on the > next commit will get) simple_unlink() to do the right thing - remove > persistency, if it's there. However, it's used by many filesystems. > We would have either to convert them all at once or split simple_unlink() > into "want persistent" and "don't want persistent" versions, the latter > being the old one. In the course of the series almost all callers > would migrate to the replacement, leaving only two pathological cases > with the old one. The same goes for simple_rmdir() (two callers left in > the end), simple_recursive_removal() (all callers gone in the end), etc. > That's a lot of noise and it's easier to start with d_make_discardable() > quietly accepting non-persistent dentries, then, in the end, add private > copies of simple_unlink() and simple_rmdir() for two weird users (configfs > and apparmorfs) and have those use dput() instead of d_make_discardable(). > At that point we'd be left with all callers of d_make_discardable() > always passing persistent dentries, allowing to add a warning in it. > > [7/54] convert simple_{link,unlink,rmdir,rename,fill_super}() to new > primitives > See above re quietly accepting non-peristent dentries in > simple_unlink(), simple_rmdir(), etc. > > Converting filesystems: > [8/54] convert ramfs and tmpfs > [9/54] procfs: make /self and /thread_self dentries persistent > [10/54] configfs, securityfs: kill_litter_super() not needed > [11/54] convert xenfs > [12/54] convert smackfs > [13/54] convert hugetlbfs > [14/54] convert mqueue > [15/54] convert bpf > [16/54] convert dlmfs > [17/54] convert fuse_ctl > [18/54] convert pstore > [19/54] convert tracefs > [20/54] convert debugfs > [21/54] debugfs: remove duplicate checks in callers of start_creating() > [22/54] convert efivarfs > [23/54] convert spufs > [24/54] convert ibmasmfs > [25/54] ibmasmfs: get rid of ibmasmfs_dir_ops > [26/54] convert devpts > [27/54] binderfs: use simple_start_creating() > [28/54] binderfs_binder_ctl_create(): kill a bogus check > [29/54] convert binderfs > [30/54] autofs_{rmdir,unlink}: dentry->d_fsdata->dentry == dentry there > [31/54] convert autofs > [32/54] convert binfmt_misc > [33/54] selinuxfs: don't stash the dentry of /policy_capabilities > [34/54] selinuxfs: new helper for attaching files to tree > [35/54] convert selinuxfs > > Several functionfs fixes, before converting it, to make life > simpler for backporting: > [36/54] functionfs: don't abuse ffs_data_closed() on fs shutdown > [37/54] functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}() > [38/54] functionfs: need to cancel ->reset_work in ->kill_sb() > [39/54] functionfs: fix the open/removal races > > ... and back to filesystems conversions: > > [40/54] functionfs: switch to simple_remove_by_name() > [41/54] convert functionfs > [42/54] gadgetfs: switch to simple_remove_by_name() > [43/54] convert gadgetfs > [44/54] hypfs: don't pin dentries twice > [45/54] hypfs: switch hypfs_create_str() to returning int > [46/54] hypfs: swich hypfs_create_u64() to returning int > [47/54] convert hypfs > [48/54] convert rpc_pipefs > [49/54] convert nfsctl > [50/54] convert rust_binderfs > > ... and no kill_litter_super() callers remain, so we > can take it out: > [51/54] get rid of kill_litter_super() > > Followups: > [52/54] convert securityfs > That was the last remaining user of simple_recursive_removal() > that did *not* mark things persistent. Now the only places where > d_make_discardable() is still called for dentries that are not marked > persistent are the calls of simple_{unlink,rmdir}() in configfs and > apparmorfs. > > [53/54] kill securityfs_recursive_remove() > Unused macro... > > [54/54] d_make_discardable(): warn if given a non-persistent dentry > > At this point there are very few call chains that might lead to > d_make_discardable() on a dentry that hadn't been made persistent: > calls of simple_unlink() and simple_rmdir() in configfs and > apparmorfs. > > Both filesystems do pin (part of) their contents in dcache, but > they are currently playing very unusual games with that. Converting > them to more usual patterns might be possible, but it's definitely > going to be a long series of changes in both cases. > > For now the easiest solution is to have both stop using simple_unlink() > and simple_rmdir() - that allows to make d_make_discardable() warn > when given a non-persistent dentry. > > Rather than giving them full-blown private copies (with calls of > d_make_discardable() replaced with dput()), let's pull the parts of > simple_unlink() and simple_rmdir() that deal with timestamps and link > counts into separate helpers (__simple_unlink() and __simple_rmdir() > resp.) and have those used by configfs and apparmorfs. >
Hi Al, when I apply this patchset my Pixel 6 no longer enumerates on lsusb or ADB. It was quite hard to bisect to this point, as this is non-deterministic and seems to be setup specific. Note, I am using android-mainline, but my understanding is that this build does not have any out-of-tree USB patches, and that there are no vendor hooks in the build. My apologies as I can't offer any other clues; there are no obviously bad dmesg logs and I'm still working on narrowing down the exact commit(s) that started this, but just wanted to send a FYI in case something stands out as obvious. Thanks! Sam
