On Thu, Feb 12, 2026 at 01:58:13PM -0800, T.J. Mercier wrote: > Currently some kernfs files (e.g. cgroup.events, memory.events) support > inotify watches for IN_MODIFY, but unlike with regular filesystems, they > do not receive IN_DELETE_SELF or IN_IGNORED events when they are > removed. > > This creates a problem for processes monitoring cgroups. For example, a > service monitoring memory.events for memory.high breaches needs to know > when a cgroup is removed to clean up its state. Where it's known that a > cgroup is removed when all processes die, without IN_DELETE_SELF the > service must resort to inefficient workarounds such as: > 1. Periodically scanning procfs to detect process death (wastes CPU and > is susceptible to PID reuse). > 2. Placing an additional IN_DELETE watch on the parent directory > (wastes resources managing double the watches). > 3. Holding a pidfd for every monitored cgroup (can exhaust file > descriptors). > > This patch enables kernfs to send IN_DELETE_SELF and IN_IGNORED events. > This allows applications to rely on a single existing watch on the file > of interest (e.g. memory.events) to receive notifications for both > modifications and the eventual removal of the file, as well as automatic > watch descriptor cleanup, simplifying userspace logic and improving > resource efficiency.
This looks very useful, But, How will the application know that ti can rely on IN_DELETE_SELF from cgroups if this is not an opt-in feature? Essentially, this is similar to the discussions on adding "remote" fs notification support (e.g. for smb) and in those discussions I insist that "remote" notification should be opt-in (which is easy to do with an fanotify init flag) and I claim that mixing "remote" events with "local" events on the same group is undesired. However, IN_IGNORED is created when an inotify watch is removed and IN_DELETE_SELF is called when a vfs inode is destroyed. When setting an inotify watch for IN_IGNORED|IN_DELETE_SELF there has to be a vfs inode with inotify mark attached, so why are those events not created already? What am I missing? Are you expecting to get IN_IGNORED|IN_DELETE_SELF on an entry while watching the parent? Because this is not how the API works. I think it should be possible to set a super block fanotify watch on cgroupfs and get all the FAN_DELETE_SELF events, but maybe we do not allow this right now, I did not check - just wanted to give you another direction to follow. > > Implementation details: > The kernfs notification worker is updated to handle file deletion. > fsnotify handles sending MODIFY events to both a watched file and its > parent, but it does not handle sending a DELETE event to the parent and > a DELETE_SELF event to the watched file in a single call. Therefore, > separate fsnotify calls are made: one for the parent (DELETE) and one > for the child (DELETE_SELF), while retaining the optimized single call IN_DELETE_SELF and IN_IGNORED are special and I don't really mind adding them to kernfs seeing that they are very useful, but adding IN_DELETE without adding IN_CREATE, that is very arbitrary and I don't like it as much. > for MODIFY events. > > Signed-off-by: T.J. Mercier <[email protected]> > --- > fs/kernfs/dir.c | 21 +++++++++++++++++++++ > fs/kernfs/file.c | 16 ++++++++++------ > fs/kernfs/kernfs-internal.h | 3 +++ > 3 files changed, 34 insertions(+), 6 deletions(-) > > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c > index 29baeeb97871..e5bda829fcb8 100644 > --- a/fs/kernfs/dir.c > +++ b/fs/kernfs/dir.c > @@ -9,6 +9,7 @@ > > #include <linux/sched.h> > #include <linux/fs.h> > +#include <linux/fsnotify_backend.h> > #include <linux/namei.h> > #include <linux/idr.h> > #include <linux/slab.h> > @@ -1471,6 +1472,23 @@ void kernfs_show(struct kernfs_node *kn, bool show) > up_write(&root->kernfs_rwsem); > } > > +static void kernfs_notify_file_deleted(struct kernfs_node *kn) > +{ > + static DECLARE_WORK(kernfs_notify_deleted_work, > + kernfs_notify_workfn); > + > + guard(spinlock_irqsave)(&kernfs_notify_lock); > + /* may overwite already pending FS_MODIFY events */ > + kn->attr.notify_event = FS_DELETE; > + > + if (!kn->attr.notify_next) { > + kernfs_get(kn); > + kn->attr.notify_next = kernfs_notify_list; > + kernfs_notify_list = kn; > + schedule_work(&kernfs_notify_deleted_work); > + } > +} > + > static void __kernfs_remove(struct kernfs_node *kn) > { > struct kernfs_node *pos, *parent; > @@ -1520,6 +1538,9 @@ static void __kernfs_remove(struct kernfs_node *kn) > struct kernfs_iattrs *ps_iattr = > parent ? parent->iattr : NULL; > > + if (kernfs_type(pos) == KERNFS_FILE) > + kernfs_notify_file_deleted(pos); > + > /* update timestamps on the parent */ > down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); > > diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c > index e978284ff983..2d21af3cfcad 100644 > --- a/fs/kernfs/file.c > +++ b/fs/kernfs/file.c > @@ -37,8 +37,8 @@ struct kernfs_open_node { > */ > #define KERNFS_NOTIFY_EOL ((void *)&kernfs_notify_list) > > -static DEFINE_SPINLOCK(kernfs_notify_lock); > -static struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL; > +DEFINE_SPINLOCK(kernfs_notify_lock); > +struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL; > > static inline struct mutex *kernfs_open_file_mutex_ptr(struct kernfs_node > *kn) > { > @@ -909,7 +909,7 @@ static loff_t kernfs_fop_llseek(struct file *file, loff_t > offset, int whence) > return ret; > } > > -static void kernfs_notify_workfn(struct work_struct *work) > +void kernfs_notify_workfn(struct work_struct *work) > { > struct kernfs_node *kn; > struct kernfs_super_info *info; > @@ -959,15 +959,19 @@ static void kernfs_notify_workfn(struct work_struct > *work) > if (p_inode) { > fsnotify(notify_event | FS_EVENT_ON_CHILD, > inode, FSNOTIFY_EVENT_INODE, > - p_inode, &name, inode, 0); > + p_inode, &name, > + (notify_event == FS_MODIFY) ? > + inode : NULL, 0); > iput(p_inode); > } > > kernfs_put(parent); > } > > - if (!p_inode) > - fsnotify_inode(inode, notify_event); > + if (notify_event == FS_DELETE) > + fsnotify_inoderemove(inode); > + else if (!p_inode) > + fsnotify_inode(inode, FS_MODIFY); Didn't you mean notify_event? Thanks, Amir.

