On Thu, Feb 19, 2026 at 3:05 AM Jan Kara <[email protected]> wrote:
>
> On Wed 18-02-26 14:10:42, T.J. Mercier wrote:
> > On Wed, Feb 18, 2026 at 11:58 AM T.J. Mercier <[email protected]> wrote:
> > > On Wed, Feb 18, 2026 at 11:15 AM T.J. Mercier <[email protected]> 
> > > wrote:
> > > > On Wed, Feb 18, 2026 at 10:37 AM Jan Kara <[email protected]> wrote:
> > > > > On Wed 18-02-26 10:06:35, T.J. Mercier wrote:
> > > > > > On Wed, Feb 18, 2026 at 10:01 AM Jan Kara <[email protected]> wrote:
> > > > > > > On Tue 17-02-26 19:22:31, T.J. Mercier wrote:
> > > > > > > > Currently some kernfs files (e.g. cgroup.events, memory.events) 
> > > > > > > > support
> > > > > > > > inotify watches for IN_MODIFY, but unlike with regular 
> > > > > > > > filesystems, they
> > > > > > > > do not receive IN_DELETE_SELF or IN_IGNORED events when they are
> > > > > > > > removed.
> > > > > > >
> > > > > > > Please see my email:
> > > > > > > https://lore.kernel.org/all/lc2jgt3yrvuvtdj2kk7q3rloie2c5mzyhfdy4zvxylx732voet@ol3kl4ackrpb
> > > > > > >
> > > > > > > I think this is actually a bug in kernfs...
> > > > > > >
> > > > > > >                                                                 
> > > > > > > Honza
> > > > > >
> > > > > > Thanks, I'm looking at this now. I've tried calling clear_nlink in
> > > > > > kernfs_iop_rmdir, but I've found that when we get back to vfs_rmdir
> > > > > > and shrink_dcache_parent is called, d_walk doesn't find any entries,
> > > > > > so shrink_kill->__dentry_kill is not called. I'm investigating why
> > > > > > that is...
> > > > >
> > > > > Strange because when I was experimenting with this in my VM I have 
> > > > > seen
> > > > > __dentry_kill being called (if the dentries were created by someone 
> > > > > looking
> > > > > up the names).
> > > >
> > > > Ahh yes, that's the difference. I was just doing mkdir
> > > > /sys/fs/cgroup/foo immediately followed by rmdir /sys/fs/cgroup/foo.
> > > > kernfs creates the dentries in kernfs_iop_lookup, so there were none
> > > > when I did the rmdir because I didn't cause any lookups.
> > > >
> > > > If I actually have a program watching
> > > > /sys/fs/cgroup/foo/memory.events, then I do see the __dentry_kill kill
> > > > calls, but despite the prior clear_nlink call i_nlink is 1 so
> > > > fsnotify_inoderemove is skipped. Something must be incrementing it.
> > >
> > > The issue was that kernfs_remove unlinks the kernfs nodes, but doesn't
> > > clear_nlink when it does so. Adding that seems to work to generate
> > > IN_DELETE_SELF and IN_IGNORED. I'll do some more testing and get a
> > > patch ready.
> >
> > This works for the rmdir case, because
> > vfs_rmdir->shrink_dcache_parent->shrink_kill->__dentry_kill is invoked
> > when the user runs rmdir.
> >
> > However the case where a kernfs file is removed because a cgroup
> > subsys is deactivated does not work, because it occurs when the user
> > writes to cgroup.subtree_control. That is a vfs_write which calls
> > fsnotify_modify for cgroup.subtree_control, but (very reasonably)
> > there is no attempt made to clean up the dcache in VFS on writes.
>
> OK, and is this mostly a theoretical concern or do you practically expect
> someone to monitor subsystem files in a cgroup with inotify to learn that
> the subsystem has been disabled? It doesn't look very probable to me...

The rmdir case is the main one I'd like to fix. In production we don't
currently disable cgroup controllers after they have been enabled. I
agree the monitor-for-subsystem-disable case seems improbable.

> > So I think kernfs still needs to generate fsnotify events manually for
> > the cgroup_subtree_control_write->cgroup_apply_control_disable case.
> > Those removals happen via kernfs_remove_by_name->__kernfs_remove, so
> > that would look a lot like what I sent in this v3 patch, even if we
> > also add clear_nlink calls for the rmdir case.
>
> If there's a sensible usecase for monitoring of subsystem files being
> deleted, we could also d_delete() the dentry from cgroup_rm_file(). But
> maybe the performance overhead would be visible for some larger scale
> removals so maybe just using fsnotify_inoderemove() to paper over the
> problem would be easier if this case is really needed.
>
>                                                                 Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

Reply via email to