On Wed 18-02-26 14:10:42, T.J. Mercier wrote:
> On Wed, Feb 18, 2026 at 11:58 AM T.J. Mercier <[email protected]> wrote:
> > On Wed, Feb 18, 2026 at 11:15 AM T.J. Mercier <[email protected]> wrote:
> > > On Wed, Feb 18, 2026 at 10:37 AM Jan Kara <[email protected]> wrote:
> > > > On Wed 18-02-26 10:06:35, T.J. Mercier wrote:
> > > > > On Wed, Feb 18, 2026 at 10:01 AM Jan Kara <[email protected]> wrote:
> > > > > > On Tue 17-02-26 19:22:31, T.J. Mercier wrote:
> > > > > > > Currently some kernfs files (e.g. cgroup.events, memory.events) 
> > > > > > > support
> > > > > > > inotify watches for IN_MODIFY, but unlike with regular 
> > > > > > > filesystems, they
> > > > > > > do not receive IN_DELETE_SELF or IN_IGNORED events when they are
> > > > > > > removed.
> > > > > >
> > > > > > Please see my email:
> > > > > > https://lore.kernel.org/all/lc2jgt3yrvuvtdj2kk7q3rloie2c5mzyhfdy4zvxylx732voet@ol3kl4ackrpb
> > > > > >
> > > > > > I think this is actually a bug in kernfs...
> > > > > >
> > > > > >                                                                 
> > > > > > Honza
> > > > >
> > > > > Thanks, I'm looking at this now. I've tried calling clear_nlink in
> > > > > kernfs_iop_rmdir, but I've found that when we get back to vfs_rmdir
> > > > > and shrink_dcache_parent is called, d_walk doesn't find any entries,
> > > > > so shrink_kill->__dentry_kill is not called. I'm investigating why
> > > > > that is...
> > > >
> > > > Strange because when I was experimenting with this in my VM I have seen
> > > > __dentry_kill being called (if the dentries were created by someone 
> > > > looking
> > > > up the names).
> > >
> > > Ahh yes, that's the difference. I was just doing mkdir
> > > /sys/fs/cgroup/foo immediately followed by rmdir /sys/fs/cgroup/foo.
> > > kernfs creates the dentries in kernfs_iop_lookup, so there were none
> > > when I did the rmdir because I didn't cause any lookups.
> > >
> > > If I actually have a program watching
> > > /sys/fs/cgroup/foo/memory.events, then I do see the __dentry_kill kill
> > > calls, but despite the prior clear_nlink call i_nlink is 1 so
> > > fsnotify_inoderemove is skipped. Something must be incrementing it.
> >
> > The issue was that kernfs_remove unlinks the kernfs nodes, but doesn't
> > clear_nlink when it does so. Adding that seems to work to generate
> > IN_DELETE_SELF and IN_IGNORED. I'll do some more testing and get a
> > patch ready.
> 
> This works for the rmdir case, because
> vfs_rmdir->shrink_dcache_parent->shrink_kill->__dentry_kill is invoked
> when the user runs rmdir.
> 
> However the case where a kernfs file is removed because a cgroup
> subsys is deactivated does not work, because it occurs when the user
> writes to cgroup.subtree_control. That is a vfs_write which calls
> fsnotify_modify for cgroup.subtree_control, but (very reasonably)
> there is no attempt made to clean up the dcache in VFS on writes.

OK, and is this mostly a theoretical concern or do you practically expect
someone to monitor subsystem files in a cgroup with inotify to learn that
the subsystem has been disabled? It doesn't look very probable to me...

> So I think kernfs still needs to generate fsnotify events manually for
> the cgroup_subtree_control_write->cgroup_apply_control_disable case.
> Those removals happen via kernfs_remove_by_name->__kernfs_remove, so
> that would look a lot like what I sent in this v3 patch, even if we
> also add clear_nlink calls for the rmdir case.

If there's a sensible usecase for monitoring of subsystem files being
deleted, we could also d_delete() the dentry from cgroup_rm_file(). But
maybe the performance overhead would be visible for some larger scale
removals so maybe just using fsnotify_inoderemove() to paper over the
problem would be easier if this case is really needed.

                                                                Honza
-- 
Jan Kara <[email protected]>
SUSE Labs, CR

Reply via email to