Re: deadlock in synchronize_srcu() in debugfs?

Johannes Berg Fri, 24 Mar 2017 02:27:51 -0700

Hi,

On Fri, 2017-03-24 at 09:56 +0100, Johannes Berg wrote:
> On Thu, 2017-03-23 at 16:29 +0100, Johannes Berg wrote:
> > Isn't it possible for the following to happen?
> > 
> > CPU1                                        CPU2
> > 
> > mutex_lock(&M); // acquires mutex
> >                                     full_proxy_xyz();
> >                                     srcu_read_lock(&debugfs_srcu);
> >                                     real_fops->xyz();
> >                                     mutex_lock(&M); // waiting for mutex
> > debugfs_remove(F);
> > synchronize_srcu(&debugfs_srcu);

> So I'm pretty sure that this can happen. I'm not convinced that it's
> happening here, but still.

I'm a bit confused, in that SRCU, of course, doesn't wait until all the
readers are done - that'd be a regular reader/writer lock or something.

However, it does (have to) wait until all the currently active read-
side sections have terminated, which still leads to a deadlock in the
example above, I think?

In his 2006 LWN article Paul wrote:

    The designer of a given subsystem is responsible for: (1) ensuring
    that SRCU read-side sleeping is bounded and (2) limiting the amount
    of memory waiting for synchronize_srcu(). [1]

In the case of debugfs files acquiring locks, (1) can't really be
guaranteed, especially if those locks can be held while doing
synchronize_srcu() [via debugfs_remove], so I still think the lockdep
annotation needs to be changed to at least have some annotation at
synchronize_srcu() time so we can detect this.

Now, I still suspect there's some other bug here in the case that I'm
seeing, because I don't actually see the "mutex_lock(&M); // waiting"
piece in the traces. I'll need to run this with some tracing on Monday
when the test guys are back from the weekend.

I'm also not sure how I can possibly fix this in debugfs in mac80211
and friends, but that's perhaps a different story. Clearly, this
debugfs patch is a good thing - the code will likely have had use-
after-free problems in this situation without it. But flagging the
potential deadlocks would make it a lot easier to find them.

johannes

Re: deadlock in synchronize_srcu() in debugfs?

Reply via email to