Re: deadlock in synchronize_srcu() in debugfs?

2017-03-31 Thread Johannes Berg
On Fri, 2017-03-31 at 11:03 +0200, Nicolai Stange wrote: > > 2) > > There's a complete deadlock situation if this happens: > > > > CPU1CPU2 > > > > debugfs_file_read(file="foo") mutex_lock(); > > srcu_read_lock(_srcu);

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-31 Thread Johannes Berg
On Fri, 2017-03-31 at 11:03 +0200, Nicolai Stange wrote: > > 2) > > There's a complete deadlock situation if this happens: > > > > CPU1CPU2 > > > > debugfs_file_read(file="foo") mutex_lock(); > > srcu_read_lock(_srcu);

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-31 Thread Nicolai Stange
On Thu, Mar 30 2017, Johannes Berg wrote: > On Thu, 2017-03-30 at 12:27 +0200, Nicolai Stange wrote: >> So, please correct me if I'm wrong, there are two problems with >> indefinitely blocking debugfs files' fops: >> >> 1. The one which actually hung your system: >>    An indefinitely blocking

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-31 Thread Nicolai Stange
On Thu, Mar 30 2017, Johannes Berg wrote: > On Thu, 2017-03-30 at 12:27 +0200, Nicolai Stange wrote: >> So, please correct me if I'm wrong, there are two problems with >> indefinitely blocking debugfs files' fops: >> >> 1. The one which actually hung your system: >>    An indefinitely blocking

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-30 Thread Johannes Berg
On Thu, 2017-03-30 at 12:27 +0200, Nicolai Stange wrote: > So, please correct me if I'm wrong, there are two problems with > indefinitely blocking debugfs files' fops: > > 1. The one which actually hung your system: >    An indefinitely blocking debugfs_remove() while holding a lock. >    Other

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-30 Thread Johannes Berg
On Thu, 2017-03-30 at 12:27 +0200, Nicolai Stange wrote: > So, please correct me if I'm wrong, there are two problems with > indefinitely blocking debugfs files' fops: > > 1. The one which actually hung your system: >    An indefinitely blocking debugfs_remove() while holding a lock. >    Other

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-30 Thread Nicolai Stange
So, please correct me if I'm wrong, there are two problems with indefinitely blocking debugfs files' fops: 1. The one which actually hung your system: An indefinitely blocking debugfs_remove() while holding a lock. Other tasks attempting to grab that same lock get stuck as well. 2. The

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-30 Thread Nicolai Stange
So, please correct me if I'm wrong, there are two problems with indefinitely blocking debugfs files' fops: 1. The one which actually hung your system: An indefinitely blocking debugfs_remove() while holding a lock. Other tasks attempting to grab that same lock get stuck as well. 2. The

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-30 Thread Johannes Berg
On Thu, 2017-03-30 at 09:32 +0200, Nicolai Stange wrote: > > I wonder if holding the RTNL during the debugfs file removal is > really needed. I'll try to have a look in the next couple of days. Yes, I'm pretty much convinced that it is. I considered doing a deferred debugfs_remove() by holding

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-30 Thread Johannes Berg
On Thu, 2017-03-30 at 09:32 +0200, Nicolai Stange wrote: > > I wonder if holding the RTNL during the debugfs file removal is > really needed. I'll try to have a look in the next couple of days. Yes, I'm pretty much convinced that it is. I considered doing a deferred debugfs_remove() by holding

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-30 Thread Nicolai Stange
Hi Johannes, On Mon, Mar 27 2017, Johannes Berg wrote: >> > Before I go hunting - has anyone seen a deadlock in >> > synchronize_srcu() in debugfs_remove() before? >> >> Not yet. How reproducible is this? > > So ... this turned out to be a livelock of sorts. > > We have a debugfs file (not

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-30 Thread Nicolai Stange
Hi Johannes, On Mon, Mar 27 2017, Johannes Berg wrote: >> > Before I go hunting - has anyone seen a deadlock in >> > synchronize_srcu() in debugfs_remove() before? >> >> Not yet. How reproducible is this? > > So ... this turned out to be a livelock of sorts. > > We have a debugfs file (not

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-27 Thread Johannes Berg
Hi, > > Before I go hunting - has anyone seen a deadlock in > > synchronize_srcu() in debugfs_remove() before? > > Not yet. How reproducible is this? So ... this turned out to be a livelock of sorts. We have a debugfs file (not upstream (yet?), it seems) that basically blocks reading data. At

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-27 Thread Johannes Berg
Hi, > > Before I go hunting - has anyone seen a deadlock in > > synchronize_srcu() in debugfs_remove() before? > > Not yet. How reproducible is this? So ... this turned out to be a livelock of sorts. We have a debugfs file (not upstream (yet?), it seems) that basically blocks reading data. At

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-27 Thread Johannes Berg
On Fri, 2017-03-24 at 13:20 -0700, Paul E. McKenney wrote: > > And I cannot resist adding this one: > > CPU 1 CPU 2 > i = srcu_read_lock();mutex_lock(); > mutex_lock();synchronize_srcu(); > mutex_unlock();

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-27 Thread Johannes Berg
On Fri, 2017-03-24 at 13:20 -0700, Paul E. McKenney wrote: > > And I cannot resist adding this one: > > CPU 1 CPU 2 > i = srcu_read_lock();mutex_lock(); > mutex_lock();synchronize_srcu(); > mutex_unlock();

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Paul E. McKenney
On Fri, Mar 24, 2017 at 12:33:22PM -0700, Paul E. McKenney wrote: > On Fri, Mar 24, 2017 at 07:51:47PM +0100, Johannes Berg wrote: > > > > > Yes.  CPU2 has a pre-existing reader that CPU1's synchronize_srcu() > > > must wait for.  But CPU2's reader cannot end until CPU1 releases > > > its lock,

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Paul E. McKenney
On Fri, Mar 24, 2017 at 12:33:22PM -0700, Paul E. McKenney wrote: > On Fri, Mar 24, 2017 at 07:51:47PM +0100, Johannes Berg wrote: > > > > > Yes.  CPU2 has a pre-existing reader that CPU1's synchronize_srcu() > > > must wait for.  But CPU2's reader cannot end until CPU1 releases > > > its lock,

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Paul E. McKenney
On Fri, Mar 24, 2017 at 07:51:47PM +0100, Johannes Berg wrote: > > > Yes.  CPU2 has a pre-existing reader that CPU1's synchronize_srcu() > > must wait for.  But CPU2's reader cannot end until CPU1 releases > > its lock, which it cannot do until after CPU2's reader ends.  Thus, > > as you say,

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Paul E. McKenney
On Fri, Mar 24, 2017 at 07:51:47PM +0100, Johannes Berg wrote: > > > Yes.  CPU2 has a pre-existing reader that CPU1's synchronize_srcu() > > must wait for.  But CPU2's reader cannot end until CPU1 releases > > its lock, which it cannot do until after CPU2's reader ends.  Thus, > > as you say,

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Johannes Berg
> Yes.  CPU2 has a pre-existing reader that CPU1's synchronize_srcu() > must wait for.  But CPU2's reader cannot end until CPU1 releases > its lock, which it cannot do until after CPU2's reader ends.  Thus, > as you say, deadlock. > > The rule is that if you are within any kind of RCU read-side

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Johannes Berg
> Yes.  CPU2 has a pre-existing reader that CPU1's synchronize_srcu() > must wait for.  But CPU2's reader cannot end until CPU1 releases > its lock, which it cannot do until after CPU2's reader ends.  Thus, > as you say, deadlock. > > The rule is that if you are within any kind of RCU read-side

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Paul E. McKenney
On Fri, Mar 24, 2017 at 10:24:46AM +0100, Johannes Berg wrote: > Hi, > > On Fri, 2017-03-24 at 09:56 +0100, Johannes Berg wrote: > > On Thu, 2017-03-23 at 16:29 +0100, Johannes Berg wrote: > > > Isn't it possible for the following to happen? > > > > > > CPU1

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Paul E. McKenney
On Fri, Mar 24, 2017 at 10:24:46AM +0100, Johannes Berg wrote: > Hi, > > On Fri, 2017-03-24 at 09:56 +0100, Johannes Berg wrote: > > On Thu, 2017-03-23 at 16:29 +0100, Johannes Berg wrote: > > > Isn't it possible for the following to happen? > > > > > > CPU1

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Johannes Berg
Hi, On Fri, 2017-03-24 at 09:56 +0100, Johannes Berg wrote: > On Thu, 2017-03-23 at 16:29 +0100, Johannes Berg wrote: > > Isn't it possible for the following to happen? > > > > CPU1CPU2 > > > > mutex_lock(); // acquires mutex > >

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Johannes Berg
Hi, On Fri, 2017-03-24 at 09:56 +0100, Johannes Berg wrote: > On Thu, 2017-03-23 at 16:29 +0100, Johannes Berg wrote: > > Isn't it possible for the following to happen? > > > > CPU1CPU2 > > > > mutex_lock(); // acquires mutex > >

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Johannes Berg
On Thu, 2017-03-23 at 16:29 +0100, Johannes Berg wrote: > Isn't it possible for the following to happen? > > CPU1 CPU2 > > mutex_lock(); > full_proxy_xyz(); > srcu_read_lock(_srcu); >

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Johannes Berg
On Thu, 2017-03-23 at 16:29 +0100, Johannes Berg wrote: > Isn't it possible for the following to happen? > > CPU1 CPU2 > > mutex_lock(); > full_proxy_xyz(); > srcu_read_lock(_srcu); >

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Johannes Berg
Hi, > Not yet. How reproducible is this? Apparently quite. I haven't tried myself - it happens during some automated test that I need to analyse further. > > We're observing that with our (backported, but very recent) driver > > against 4.9 (and 4.10, I think), > > Do I understand it correctly

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Johannes Berg
Hi, > Not yet. How reproducible is this? Apparently quite. I haven't tried myself - it happens during some automated test that I need to analyse further. > > We're observing that with our (backported, but very recent) driver > > against 4.9 (and 4.10, I think), > > Do I understand it correctly

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Johannes Berg
On Thu, 2017-03-23 at 08:37 -0700, Paul E. McKenney wrote: > I have not seen this, but my usual question for __synchronize_srcu() > is if some other task is blocked holding srcu_read_lock() for that > same srcu_struct. > Not as far as I can see - but that was the scenario I was outlining in my

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Johannes Berg
On Thu, 2017-03-23 at 08:37 -0700, Paul E. McKenney wrote: > I have not seen this, but my usual question for __synchronize_srcu() > is if some other task is blocked holding srcu_read_lock() for that > same srcu_struct. > Not as far as I can see - but that was the scenario I was outlining in my

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Paul E. McKenney
On Thu, Mar 23, 2017 at 03:54:46PM +0100, Johannes Berg wrote: > Hi, > > Before I go hunting - has anyone seen a deadlock in synchronize_srcu() > in debugfs_remove() before? We're observing that with our (backported, > but very recent) driver against 4.9 (and 4.10, I think), but there are > no

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Paul E. McKenney
On Thu, Mar 23, 2017 at 03:54:46PM +0100, Johannes Berg wrote: > Hi, > > Before I go hunting - has anyone seen a deadlock in synchronize_srcu() > in debugfs_remove() before? We're observing that with our (backported, > but very recent) driver against 4.9 (and 4.10, I think), but there are > no

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Nicolai Stange
Hi Johannes, On Thu, Mar 23 2017, Johannes Berg wrote: > Before I go hunting - has anyone seen a deadlock in synchronize_srcu() > in debugfs_remove() before? Not yet. How reproducible is this? > We're observing that with our (backported, but very recent) driver > against 4.9 (and 4.10, I

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Nicolai Stange
Hi Johannes, On Thu, Mar 23 2017, Johannes Berg wrote: > Before I go hunting - has anyone seen a deadlock in synchronize_srcu() > in debugfs_remove() before? Not yet. How reproducible is this? > We're observing that with our (backported, but very recent) driver > against 4.9 (and 4.10, I

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Johannes Berg
On Thu, 2017-03-23 at 15:54 +0100, Johannes Berg wrote: > Before I go hunting - has anyone seen a deadlock in > synchronize_srcu() in debugfs_remove() before? Isn't it possible for the following to happen? CPU1CPU2 mutex_lock();

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Johannes Berg
On Thu, 2017-03-23 at 15:54 +0100, Johannes Berg wrote: > Before I go hunting - has anyone seen a deadlock in > synchronize_srcu() in debugfs_remove() before? Isn't it possible for the following to happen? CPU1CPU2 mutex_lock();

deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Johannes Berg
Hi, Before I go hunting - has anyone seen a deadlock in synchronize_srcu() in debugfs_remove() before? We're observing that with our (backported, but very recent) driver against 4.9 (and 4.10, I think), but there are no backports of any debugfs things so the backport itself doesn't seem like a

deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Johannes Berg
Hi, Before I go hunting - has anyone seen a deadlock in synchronize_srcu() in debugfs_remove() before? We're observing that with our (backported, but very recent) driver against 4.9 (and 4.10, I think), but there are no backports of any debugfs things so the backport itself doesn't seem like a