On Mon 14-11-16 13:48:27, Amir Goldstein wrote:
> Handling fanotify events does not entail dereferencing fsnotify_mark
> beyond the point of fanotify_should_send_event().
> 
> For the case of permission events, which may block indefinitely,
> return -EAGAIN and then fsnotify() will call handle_event() again
> without a reference to the mark.
> 
> Without a reference to the mark, there is no need to call
> handle_event() under fsnotify_mark_srcu[0] read side lock,
> so we drop fsnotify_mark_srcu[0] while handling the event
> and grab it back before continuing to the next mark.
> 
> After this change, a blocking permission event will no longer
> block closing of any file descriptors of 0 priority groups,
> i.e: inotify and fanotify groups of class FAN_CLASS_NOTIF.
> 
> Reported-by: Miklos Szeredi <[email protected]>
> Signed-off-by: Amir Goldstein <[email protected]>

Well, this has a similar problem as my attempt to fix the issue. The
current mark can get removed from the mark list while waiting for userspace
response. ->next pointer is still valid at that moment but ->next->pprev
already points to mark preceding us (that's how rcu lists work). When
->next mark then gets removed from the list and destroyed (it may be
protected by the second srcu), our ->next pointer points to freed memory.

Furthermore I don't like the scheme of ->handle_event returning -EAGAIN and
then dropping the srcu lock - I'd rather have some helpers provided by the
generic fsnotify code to drop srcu lock. That needs some propagation of
information inside the ->handle_event and then the helper but that's IMO
cleaner. Anyway, that is just a technical detail.

I have some ideas how to fix up issues with my refcounting approach -
refcount would pin marks not only in memory but also in lists but I have
yet to see whether that works out sensibly (it would mean that dropping
mark reference would then need to take group->mark_mutex and that may cause
lock ordering issues).

                                                                Honza


> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index e0e5f7c..c7689ad 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -176,7 +176,7 @@ init: __maybe_unused
>  static int fanotify_handle_event(struct fsnotify_group *group,
>                                struct inode *inode,
>                                struct fsnotify_mark *inode_mark,
> -                              struct fsnotify_mark *fanotify_mark,
> +                              struct fsnotify_mark *vfsmnt_mark,
>                                u32 mask, void *data, int data_type,
>                                const unsigned char *file_name, u32 cookie)
>  {
> @@ -195,9 +195,16 @@ static int fanotify_handle_event(struct fsnotify_group 
> *group,
>       BUILD_BUG_ON(FAN_ACCESS_PERM != FS_ACCESS_PERM);
>       BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR);
>  
> -     if (!fanotify_should_send_event(inode_mark, fanotify_mark, mask, data,
> -                                     data_type))
> -             return 0;
> +     if (inode_mark || vfsmnt_mark) {
> +             if (!fanotify_should_send_event(inode_mark, vfsmnt_mark, mask,
> +                                             data, data_type))
> +                     return 0;
> +#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
> +             /* Ask to be called again without a reference to mark */
> +             if (mask & FAN_ALL_PERM_EVENTS)
> +                     return -EAGAIN;
> +#endif
> +     }
>  
>       pr_debug("%s: group=%p inode=%p mask=%x\n", __func__, group, inode,
>                mask);
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index af5c523a..5b9a248 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -291,6 +291,29 @@ int fsnotify(struct inode *to_tell, __u32 mask, void 
> *data, int data_is,
>               ret = send_to_group(to_tell, inode_mark, vfsmount_mark, mask,
>                                   data, data_is, cookie, file_name);
>  
> +             /*
> +              * If handle_event() is going to block, we call it again
> +              * witout holding fsnotify_mark_srcu[0], which is protecting
> +              * the low priority mark lists.
> +              * We are still holding fsnotify_mark_srcu[1], which
> +              * is protecting the high priority marks in the first half
> +              * of the mark list, which is where we are at.
> +              */
> +             if (group->priority > 0 && ret == -EAGAIN) {
> +                     srcu_read_unlock(&fsnotify_mark_srcu[0], idx);
> +
> +                     ret = group->ops->handle_event(group, to_tell,
> +                                                    NULL, NULL,
> +                                                    mask, data, data_is,
> +                                                    file_name, cookie);
> +
> +                     /*
> +                      * We need to hold fsnotify_mark_srcu[0], because
> +                      * next mark may be low priority.
> +                      */
> +                     idx = srcu_read_lock(&fsnotify_mark_srcu[0]);
> +             }
> +
>               if (ret && (mask & ALL_FSNOTIFY_PERM_EVENTS))
>                       goto out;
>  
> -- 
> 2.7.4
> 
-- 
Jan Kara <[email protected]>
SUSE Labs, CR

Reply via email to