subject:"Re\: deadlock in synchronize

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-31 Thread Johannes Berg

On Fri, 2017-03-31 at 11:03 +0200, Nicolai Stange wrote:

> > 2)
> > There's a complete deadlock situation if this happens:
> > 
> > CPU1CPU2
> > 
> > debugfs_file_read(file="foo")   mutex_lock();
> > srcu_read_lock(_srcu);  debugfs_remove(file="
> > bar")
> > mutex_lock(); synchronize_srcu(
> > bugfs_srcu)
> > 
> > This is intrinsically unrecoverable.
> 
> Let's address this in a second step.

I suspect that it's actually better to address both in the same step,
but whatever :)

> > That seems like a strange argument to me - something has to exist
> > for a process to be able to look up the file, and currently the
> > proxy also has to exist?
> 
> No, the proxies are created at file _open_ time and installed at the
> struct file.
> 
> Rationale: there are potentially many debugfs files with only few of
> them opened at a time and a proxy, i.e. a struct file_operations, is
> quite large.

Ok, that makes sense. But that's not really a show-stopper, is it?

You can either have a proxy or not have it at remove time, and if you
don't have one then you can remove safely, right? And if you do have a
proxy, then you have to write_lock() it.

Lookup of the proxy itself can still be protected by (S)RCU, but you
can't go into the debugfs file callbacks while you hold (S)RCU, so that
you can safely determine whether or not a proxy exists.

I'm handwaving though - there are problems here with freeing the proxy
again when you close a file. Perhaps something like
 * first, remove the pointer and wait for a grace period
 * write_lock() it to make sure nobody is still inside it
 * delete it now

works.

> I'll work out a solution this weekend and send some RFC patches then.
> 
Thanks!

johannes

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-31 Thread Johannes Berg

On Fri, 2017-03-31 at 11:03 +0200, Nicolai Stange wrote:

> > 2)
> > There's a complete deadlock situation if this happens:
> > 
> > CPU1CPU2
> > 
> > debugfs_file_read(file="foo")   mutex_lock();
> > srcu_read_lock(_srcu);  debugfs_remove(file="
> > bar")
> > mutex_lock(); synchronize_srcu(
> > bugfs_srcu)
> > 
> > This is intrinsically unrecoverable.
> 
> Let's address this in a second step.

I suspect that it's actually better to address both in the same step,
but whatever :)

> > That seems like a strange argument to me - something has to exist
> > for a process to be able to look up the file, and currently the
> > proxy also has to exist?
> 
> No, the proxies are created at file _open_ time and installed at the
> struct file.
> 
> Rationale: there are potentially many debugfs files with only few of
> them opened at a time and a proxy, i.e. a struct file_operations, is
> quite large.

Ok, that makes sense. But that's not really a show-stopper, is it?

You can either have a proxy or not have it at remove time, and if you
don't have one then you can remove safely, right? And if you do have a
proxy, then you have to write_lock() it.

Lookup of the proxy itself can still be protected by (S)RCU, but you
can't go into the debugfs file callbacks while you hold (S)RCU, so that
you can safely determine whether or not a proxy exists.

I'm handwaving though - there are problems here with freeing the proxy
again when you close a file. Perhaps something like
 * first, remove the pointer and wait for a grace period
 * write_lock() it to make sure nobody is still inside it
 * delete it now

works.

> I'll work out a solution this weekend and send some RFC patches then.
> 
Thanks!

johannes

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-31 Thread Nicolai Stange

On Thu, Mar 30 2017, Johannes Berg wrote:

> On Thu, 2017-03-30 at 12:27 +0200, Nicolai Stange wrote:
>> So, please correct me if I'm wrong, there are two problems with
>> indefinitely blocking debugfs files' fops:
>> 
>> 1. The one which actually hung your system:
>>    An indefinitely blocking debugfs_remove() while holding a lock.
>>    Other tasks attempting to grab that same lock get stuck as well.
>> 
>> 2. The other one you've found, namely that the locking granularity is
>>    too coarse: a debugfs_remove() would get blocked by unrelated
>> files'
>>    pending fops.
>
> No, this isn't really an accurate description of the two problems.
>
>> AFAICS, the first one can't get resolved by simply refining the
>> blocking granularity: a debugfs_remove() on the indefinitely blocking
>> file would still block as well.
>
> Correct.
>
> The first problem - the one I ran into - is the following:
>
> 1)
> A given debugfs file's .read() was waiting for some event to happen
> (being a blocking file), and I was trying to debugfs_remove() some
> completely unrelated file, this got stuck.

I got it now. I was missing the "completely unrelated file" part.
(Admittedly, a related file would have made no sense at all -- the
remover would have been responsible to cancel any indefinite blocking in
there, as you said).

> Due to me holding a lock while doing this debugfs_remove(), other tasks
> *also* got stuck, but that's just a sub-problem - having the
> debugfs_remove() of an unrelated file get stuck would already have been
> a problem - the fact that other tasks also got stuck was just an
> additional wrinkle.
>
> Mind - this is a livelock of sorts - if the debugfs file will ever make
> progress, the system can recover.
>


> 2)
> There's a complete deadlock situation if this happens:
>
> CPU1  CPU2
>
> debugfs_file_read(file="foo") mutex_lock();
> srcu_read_lock(_srcu);debugfs_remove(file="bar")
> mutex_lock();   synchronize_srcu(_srcu)
>
> This is intrinsically unrecoverable.

Let's address this in a second step.


> This is the core of the problem really - that you're tying completely
> unrelated processes together.
>
> Therefore, to continue using SRCU in this way means that you have to
> disallow blocking debugfs files. There may not be many of those, but
> any single one of them would be a problem.
>
> If we stop using SRCU this way we can discuss how we can fix it - but
> anything more coarse grained than per-file (which really makes SRCU
> unsuitable) would still have the same problem one way or another. And
> we haven't even addressed the deadlock situation (2 above) either.
>
>> When I did this, per-file reader/writer locks actuallt came to my
>> mind first. The problem here is that debugfs_use_file_start() must
>> grab the lock first and check whether the file has been deleted in
>> the meanwhile. But as it stands, there's nothing that would guarantee
>> the existence of the lock at the time it's to be taken.
>
> That seems like a strange argument to me - something has to exist for a
> process to be able to look up the file, and currently the proxy also
> has to exist?

No, the proxies are created at file _open_ time and installed at the
struct file.

Rationale: there are potentially many debugfs files with only few of them
opened at a time and a proxy, i.e. a struct file_operations, is quite
large.


> So when a file is created you can allocate the proxy for it, and if you
> can look up the proxy object - perhaps even using plain RCU - then you
> also have the lock? IOW, instead of storing just the real_fops in
> d_fsdata, you can store a small object that holds a lock and the
> real_fops. You can always access that object, and lock it, but the
> real_fops inside it might eventually end up NULL which you handle
> through proxying. No?

As said, there isn't always a proxy object around.

Of course, attaching some sort of lock on a per-file basis should be
doable. I just refrained from doing it so far (and resorted to SRCU
instead) because I wasn't aware of those indefinite blockers and wanted
to avoid the additional complexity (namely avoiding use-after-frees on
that lock).

I'll work out a solution this weekend and send some RFC patches then.

Thanks for your clarifications!

Nicolai

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-31 Thread Nicolai Stange

On Thu, Mar 30 2017, Johannes Berg wrote:

> On Thu, 2017-03-30 at 12:27 +0200, Nicolai Stange wrote:
>> So, please correct me if I'm wrong, there are two problems with
>> indefinitely blocking debugfs files' fops:
>> 
>> 1. The one which actually hung your system:
>>    An indefinitely blocking debugfs_remove() while holding a lock.
>>    Other tasks attempting to grab that same lock get stuck as well.
>> 
>> 2. The other one you've found, namely that the locking granularity is
>>    too coarse: a debugfs_remove() would get blocked by unrelated
>> files'
>>    pending fops.
>
> No, this isn't really an accurate description of the two problems.
>
>> AFAICS, the first one can't get resolved by simply refining the
>> blocking granularity: a debugfs_remove() on the indefinitely blocking
>> file would still block as well.
>
> Correct.
>
> The first problem - the one I ran into - is the following:
>
> 1)
> A given debugfs file's .read() was waiting for some event to happen
> (being a blocking file), and I was trying to debugfs_remove() some
> completely unrelated file, this got stuck.

I got it now. I was missing the "completely unrelated file" part.
(Admittedly, a related file would have made no sense at all -- the
remover would have been responsible to cancel any indefinite blocking in
there, as you said).

> Due to me holding a lock while doing this debugfs_remove(), other tasks
> *also* got stuck, but that's just a sub-problem - having the
> debugfs_remove() of an unrelated file get stuck would already have been
> a problem - the fact that other tasks also got stuck was just an
> additional wrinkle.
>
> Mind - this is a livelock of sorts - if the debugfs file will ever make
> progress, the system can recover.
>


> 2)
> There's a complete deadlock situation if this happens:
>
> CPU1  CPU2
>
> debugfs_file_read(file="foo") mutex_lock();
> srcu_read_lock(_srcu);debugfs_remove(file="bar")
> mutex_lock();   synchronize_srcu(_srcu)
>
> This is intrinsically unrecoverable.

Let's address this in a second step.


> This is the core of the problem really - that you're tying completely
> unrelated processes together.
>
> Therefore, to continue using SRCU in this way means that you have to
> disallow blocking debugfs files. There may not be many of those, but
> any single one of them would be a problem.
>
> If we stop using SRCU this way we can discuss how we can fix it - but
> anything more coarse grained than per-file (which really makes SRCU
> unsuitable) would still have the same problem one way or another. And
> we haven't even addressed the deadlock situation (2 above) either.
>
>> When I did this, per-file reader/writer locks actuallt came to my
>> mind first. The problem here is that debugfs_use_file_start() must
>> grab the lock first and check whether the file has been deleted in
>> the meanwhile. But as it stands, there's nothing that would guarantee
>> the existence of the lock at the time it's to be taken.
>
> That seems like a strange argument to me - something has to exist for a
> process to be able to look up the file, and currently the proxy also
> has to exist?

No, the proxies are created at file _open_ time and installed at the
struct file.

Rationale: there are potentially many debugfs files with only few of them
opened at a time and a proxy, i.e. a struct file_operations, is quite
large.


> So when a file is created you can allocate the proxy for it, and if you
> can look up the proxy object - perhaps even using plain RCU - then you
> also have the lock? IOW, instead of storing just the real_fops in
> d_fsdata, you can store a small object that holds a lock and the
> real_fops. You can always access that object, and lock it, but the
> real_fops inside it might eventually end up NULL which you handle
> through proxying. No?

As said, there isn't always a proxy object around.

Of course, attaching some sort of lock on a per-file basis should be
doable. I just refrained from doing it so far (and resorted to SRCU
instead) because I wasn't aware of those indefinite blockers and wanted
to avoid the additional complexity (namely avoiding use-after-frees on
that lock).

I'll work out a solution this weekend and send some RFC patches then.

Thanks for your clarifications!

Nicolai

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-30 Thread Johannes Berg

On Thu, 2017-03-30 at 12:27 +0200, Nicolai Stange wrote:
> So, please correct me if I'm wrong, there are two problems with
> indefinitely blocking debugfs files' fops:
> 
> 1. The one which actually hung your system:
>    An indefinitely blocking debugfs_remove() while holding a lock.
>    Other tasks attempting to grab that same lock get stuck as well.
> 
> 2. The other one you've found, namely that the locking granularity is
>    too coarse: a debugfs_remove() would get blocked by unrelated
> files'
>    pending fops.

No, this isn't really an accurate description of the two problems.

> AFAICS, the first one can't get resolved by simply refining the
> blocking granularity: a debugfs_remove() on the indefinitely blocking
> file would still block as well.

Correct.

The first problem - the one I ran into - is the following:

1)
A given debugfs file's .read() was waiting for some event to happen
(being a blocking file), and I was trying to debugfs_remove() some
completely unrelated file, this got stuck.
Due to me holding a lock while doing this debugfs_remove(), other tasks
*also* got stuck, but that's just a sub-problem - having the
debugfs_remove() of an unrelated file get stuck would already have been
a problem - the fact that other tasks also got stuck was just an
additional wrinkle.

Mind - this is a livelock of sorts - if the debugfs file will ever make
progress, the system can recover.

2)
There's a complete deadlock situation if this happens:

CPU1CPU2

debugfs_file_read(file="foo")   mutex_lock();
srcu_read_lock(_srcu);  debugfs_remove(file="bar")
mutex_lock(); synchronize_srcu(_srcu)

This is intrinsically unrecoverable.

> > Yes, I'm pretty much convinced that it is. I considered doing a
> > deferred debugfs_remove() by holding the object around, but then I
> > can't be sure that I can later re-add a new object with the same
> > directory name,
> 
> Ah, I see. What about making debugfs provide separate
> 
> - debugfs_remove_start(): unlink file (c.f. __debugfs_remove()), can
> be
>   called under a lock
> and
> - debugfs_remove_wait(): the synchronize_srcu(), must not get called
>   under any lock
> 
> ?

I don't think it would really help much - the lock acquisition in my
case is in a completely different layer (cfg80211) than the code doing
debugfs_remove(), so delaying the debugfs_remove_wait() would mean
moving it somewhere else completely. Also, afaict you still have to
keep the object around until debugfs_remove_wait() has finished, so you
still have the name reuse problem.

> In short, I'd make calling debugfs_remove() with any lock being
> held illegal.
> 
> What do you think?

I think I'll stop using debugfs if that happens - too much hassle.

> > Half the thread here was about that - it's not easily doable
> > because
> > you'd have to teach lockdep about the special SRCU semantics first.
> > Since it doesn't even seem to do read/write locks properly that's
> > probably a massive undertaking.
> 
> I haven't looked into lockdep yet. So there is no way to ask lockdep
> "is there any lock held?" from debugfs_remove() before doing the
> synchonize_srcu()? 

That's probably possible, yes.

> > > > Similarly, nobody should be blocking in debugfs files, like we
> > > > did
> > > > in ours, but also smsdvb_stats_read(), crtc_crc_open() look
> > > > like
> > > > they could block for quite a while.
> > > 
> > > Blocking in the debugfs files' fops shall be fine by itself,
> > > that's why SRCU is used for the removal stuff.
> > 
> > No, it isn't fine at all now!
> 
> "Shall" in the sense of "it's a requirement" and if it isn't
> fulfilled, it must be fixed. So I do agree with you here.

Ok. So let's assume that we allow blocking (indefinitely, at least
until you remove the file) for a debugfs file - then immediately due to
the way SRCU is used you've now made some debugfs_remove() calls block
indefinitely. Not just on the same file - that'd be fine and a bug,
because before you remove a file you should wake it up - but any other
file in the system.

Even splitting it into debugfs_remove_start() and debugfs_remove_wait()
will not do anything to fix this problem - debugfs_remove_wait() would
then block forever and the task that called it will not be able to make
any forward progress, until the completely unrelated other files
created some data or got closed.

This is the core of the problem really - that you're tying completely
unrelated processes together.

Therefore, to continue using SRCU in this way means that you have to
disallow blocking debugfs files. There may not be many of those, but
any single one of them would be a problem.

If we stop using SRCU this way we can discuss how we can fix it - but
anything more coarse grained than per-file (which really makes SRCU
unsuitable) would still have the same problem one way or another. And
we haven't even addressed the deadlock situation (2 above) either.

> When I did

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-30 Thread Johannes Berg

On Thu, 2017-03-30 at 12:27 +0200, Nicolai Stange wrote:
> So, please correct me if I'm wrong, there are two problems with
> indefinitely blocking debugfs files' fops:
> 
> 1. The one which actually hung your system:
>    An indefinitely blocking debugfs_remove() while holding a lock.
>    Other tasks attempting to grab that same lock get stuck as well.
> 
> 2. The other one you've found, namely that the locking granularity is
>    too coarse: a debugfs_remove() would get blocked by unrelated
> files'
>    pending fops.

No, this isn't really an accurate description of the two problems.

> AFAICS, the first one can't get resolved by simply refining the
> blocking granularity: a debugfs_remove() on the indefinitely blocking
> file would still block as well.

Correct.

The first problem - the one I ran into - is the following:

1)
A given debugfs file's .read() was waiting for some event to happen
(being a blocking file), and I was trying to debugfs_remove() some
completely unrelated file, this got stuck.
Due to me holding a lock while doing this debugfs_remove(), other tasks
*also* got stuck, but that's just a sub-problem - having the
debugfs_remove() of an unrelated file get stuck would already have been
a problem - the fact that other tasks also got stuck was just an
additional wrinkle.

Mind - this is a livelock of sorts - if the debugfs file will ever make
progress, the system can recover.

2)
There's a complete deadlock situation if this happens:

CPU1CPU2

debugfs_file_read(file="foo")   mutex_lock();
srcu_read_lock(_srcu);  debugfs_remove(file="bar")
mutex_lock(); synchronize_srcu(_srcu)

This is intrinsically unrecoverable.

> > Yes, I'm pretty much convinced that it is. I considered doing a
> > deferred debugfs_remove() by holding the object around, but then I
> > can't be sure that I can later re-add a new object with the same
> > directory name,
> 
> Ah, I see. What about making debugfs provide separate
> 
> - debugfs_remove_start(): unlink file (c.f. __debugfs_remove()), can
> be
>   called under a lock
> and
> - debugfs_remove_wait(): the synchronize_srcu(), must not get called
>   under any lock
> 
> ?

I don't think it would really help much - the lock acquisition in my
case is in a completely different layer (cfg80211) than the code doing
debugfs_remove(), so delaying the debugfs_remove_wait() would mean
moving it somewhere else completely. Also, afaict you still have to
keep the object around until debugfs_remove_wait() has finished, so you
still have the name reuse problem.

> In short, I'd make calling debugfs_remove() with any lock being
> held illegal.
> 
> What do you think?

I think I'll stop using debugfs if that happens - too much hassle.

> > Half the thread here was about that - it's not easily doable
> > because
> > you'd have to teach lockdep about the special SRCU semantics first.
> > Since it doesn't even seem to do read/write locks properly that's
> > probably a massive undertaking.
> 
> I haven't looked into lockdep yet. So there is no way to ask lockdep
> "is there any lock held?" from debugfs_remove() before doing the
> synchonize_srcu()? 

That's probably possible, yes.

> > > > Similarly, nobody should be blocking in debugfs files, like we
> > > > did
> > > > in ours, but also smsdvb_stats_read(), crtc_crc_open() look
> > > > like
> > > > they could block for quite a while.
> > > 
> > > Blocking in the debugfs files' fops shall be fine by itself,
> > > that's why SRCU is used for the removal stuff.
> > 
> > No, it isn't fine at all now!
> 
> "Shall" in the sense of "it's a requirement" and if it isn't
> fulfilled, it must be fixed. So I do agree with you here.

Ok. So let's assume that we allow blocking (indefinitely, at least
until you remove the file) for a debugfs file - then immediately due to
the way SRCU is used you've now made some debugfs_remove() calls block
indefinitely. Not just on the same file - that'd be fine and a bug,
because before you remove a file you should wake it up - but any other
file in the system.

Even splitting it into debugfs_remove_start() and debugfs_remove_wait()
will not do anything to fix this problem - debugfs_remove_wait() would
then block forever and the task that called it will not be able to make
any forward progress, until the completely unrelated other files
created some data or got closed.

This is the core of the problem really - that you're tying completely
unrelated processes together.

Therefore, to continue using SRCU in this way means that you have to
disallow blocking debugfs files. There may not be many of those, but
any single one of them would be a problem.

If we stop using SRCU this way we can discuss how we can fix it - but
anything more coarse grained than per-file (which really makes SRCU
unsuitable) would still have the same problem one way or another. And
we haven't even addressed the deadlock situation (2 above) either.

> When I did

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-30 Thread Nicolai Stange

So, please correct me if I'm wrong, there are two problems with
indefinitely blocking debugfs files' fops:

1. The one which actually hung your system:
   An indefinitely blocking debugfs_remove() while holding a lock.
   Other tasks attempting to grab that same lock get stuck as well.

2. The other one you've found, namely that the locking granularity is
   too coarse: a debugfs_remove() would get blocked by unrelated files'
   pending fops.

AFAICS, the first one can't get resolved by simply refining the blocking
granularity: a debugfs_remove() on the indefinitely blocking file would
still block as well.

But:

On Thu, Mar 30 2017, Johannes Berg wrote:

> On Thu, 2017-03-30 at 09:32 +0200, Nicolai Stange wrote:
>> 
>> I wonder if holding the RTNL during the debugfs file removal is
>> really needed. I'll try to have a look in the next couple of days.
>
> Yes, I'm pretty much convinced that it is. I considered doing a
> deferred debugfs_remove() by holding the object around, but then I
> can't be sure that I can later re-add a new object with the same
> directory name,

Ah, I see. What about making debugfs provide separate

- debugfs_remove_start(): unlink file (c.f. __debugfs_remove()), can be
  called under a lock
and
- debugfs_remove_wait(): the synchronize_srcu(), must not get called
  under any lock

?

This would solve 1.). It would still be nice to detect those situations,
i.e. calls to debugfs_remove() with some lock being held, though.

In short, I'd make calling debugfs_remove() with any lock being
held illegal.

What do you think?

> Half the thread here was about that - it's not easily doable because
> you'd have to teach lockdep about the special SRCU semantics first.
> Since it doesn't even seem to do read/write locks properly that's
> probably a massive undertaking.

I haven't looked into lockdep yet. So there is no way to ask lockdep
"is there any lock held?" from debugfs_remove() before doing the
synchonize_srcu()? 

> I also doubt that it's useful, because even if we did flag this sort of
> situation, it can occur across very different drivers - for example the
> netronome driver using rtnl_lock() inside its debugfs files, and
> mac80211 removing a completely unrelated debugfs file within
> rtnl_lock().

I'm proposing to convert the latter to a
debugfs_remove_start()/debugfs_remove_wait() pair.

>> > Similarly, nobody should be blocking in debugfs files, like we did
>> > in ours, but also smsdvb_stats_read(), crtc_crc_open() look like
>> > they could block for quite a while.
>> 
>> Blocking in the debugfs files' fops shall be fine by itself, that's
>> why SRCU is used for the removal stuff.
>
> No, it isn't fine at all now!

"Shall" in the sense of "it's a requirement" and if it isn't fulfilled,
it must be fixed. So I do agree with you here.

> If I have a debugfs file - like the one I
> had - that could block for external events (firmware, in my case), then
> *any* other debugfs_remove() in the whole system would also block
> indefinitely. That's a major problem!

Indeed.

> Ultimately, I'm not sure I see why one couldn't just have a
> reader/writer lock per *file*, which would be the ultimate granularity
> to solve this. Obviously, a blocking file has to be aborted before
> being removed itself, but there's nothing that says that you can't
> remove any other file - even from the same directory - while this one
> is in a blocking read.

When I did this, per-file reader/writer locks actuallt came to my mind
first. The problem here is that debugfs_use_file_start() must grab the
lock first and check whether the file has been deleted in the meanwhile.
But as it stands, there's nothing that would guarantee the existence of
the lock at the time it's to be taken.

Anyways, I'll have to think a little about possible solutions to mitigate
problem 2.)

Thanks,

Nicolai

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-30 Thread Nicolai Stange

So, please correct me if I'm wrong, there are two problems with
indefinitely blocking debugfs files' fops:

1. The one which actually hung your system:
   An indefinitely blocking debugfs_remove() while holding a lock.
   Other tasks attempting to grab that same lock get stuck as well.

2. The other one you've found, namely that the locking granularity is
   too coarse: a debugfs_remove() would get blocked by unrelated files'
   pending fops.

AFAICS, the first one can't get resolved by simply refining the blocking
granularity: a debugfs_remove() on the indefinitely blocking file would
still block as well.

But:

On Thu, Mar 30 2017, Johannes Berg wrote:

> On Thu, 2017-03-30 at 09:32 +0200, Nicolai Stange wrote:
>> 
>> I wonder if holding the RTNL during the debugfs file removal is
>> really needed. I'll try to have a look in the next couple of days.
>
> Yes, I'm pretty much convinced that it is. I considered doing a
> deferred debugfs_remove() by holding the object around, but then I
> can't be sure that I can later re-add a new object with the same
> directory name,

Ah, I see. What about making debugfs provide separate

- debugfs_remove_start(): unlink file (c.f. __debugfs_remove()), can be
  called under a lock
and
- debugfs_remove_wait(): the synchronize_srcu(), must not get called
  under any lock

?

This would solve 1.). It would still be nice to detect those situations,
i.e. calls to debugfs_remove() with some lock being held, though.

In short, I'd make calling debugfs_remove() with any lock being
held illegal.

What do you think?

> Half the thread here was about that - it's not easily doable because
> you'd have to teach lockdep about the special SRCU semantics first.
> Since it doesn't even seem to do read/write locks properly that's
> probably a massive undertaking.

I haven't looked into lockdep yet. So there is no way to ask lockdep
"is there any lock held?" from debugfs_remove() before doing the
synchonize_srcu()? 

> I also doubt that it's useful, because even if we did flag this sort of
> situation, it can occur across very different drivers - for example the
> netronome driver using rtnl_lock() inside its debugfs files, and
> mac80211 removing a completely unrelated debugfs file within
> rtnl_lock().

I'm proposing to convert the latter to a
debugfs_remove_start()/debugfs_remove_wait() pair.

>> > Similarly, nobody should be blocking in debugfs files, like we did
>> > in ours, but also smsdvb_stats_read(), crtc_crc_open() look like
>> > they could block for quite a while.
>> 
>> Blocking in the debugfs files' fops shall be fine by itself, that's
>> why SRCU is used for the removal stuff.
>
> No, it isn't fine at all now!

"Shall" in the sense of "it's a requirement" and if it isn't fulfilled,
it must be fixed. So I do agree with you here.

> If I have a debugfs file - like the one I
> had - that could block for external events (firmware, in my case), then
> *any* other debugfs_remove() in the whole system would also block
> indefinitely. That's a major problem!

Indeed.

> Ultimately, I'm not sure I see why one couldn't just have a
> reader/writer lock per *file*, which would be the ultimate granularity
> to solve this. Obviously, a blocking file has to be aborted before
> being removed itself, but there's nothing that says that you can't
> remove any other file - even from the same directory - while this one
> is in a blocking read.

When I did this, per-file reader/writer locks actuallt came to my mind
first. The problem here is that debugfs_use_file_start() must grab the
lock first and check whether the file has been deleted in the meanwhile.
But as it stands, there's nothing that would guarantee the existence of
the lock at the time it's to be taken.

Anyways, I'll have to think a little about possible solutions to mitigate
problem 2.)

Thanks,

Nicolai

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-30 Thread Johannes Berg

On Thu, 2017-03-30 at 09:32 +0200, Nicolai Stange wrote:
> 
> I wonder if holding the RTNL during the debugfs file removal is
> really needed. I'll try to have a look in the next couple of days.

Yes, I'm pretty much convinced that it is. I considered doing a
deferred debugfs_remove() by holding the object around, but then I
can't be sure that I can later re-add a new object with the same
directory name, so I have much more complexity - I'm not even sure that
can be solved at all, *perhaps* by renaming in debugfs first, but
that's major new complexity.

Enough complexity that I'm considering just removing debugfs usage
entirely and invent new mechanisms, or use sysfs, or something else
instead.

> Summarizing, the problem is the call to the indefinitely blocking
> srcu_synchronize() while having a lock held? I'll see whether I can
> ask lockdep if any lock is held and spit out a warning then.

Half the thread here was about that - it's not easily doable because
you'd have to teach lockdep about the special SRCU semantics first.
Since it doesn't even seem to do read/write locks properly that's
probably a massive undertaking.

I also doubt that it's useful, because even if we did flag this sort of
situation, it can occur across very different drivers - for example the
netronome driver using rtnl_lock() inside its debugfs files, and
mac80211 removing a completely unrelated debugfs file within
rtnl_lock().

> 
> > Similarly, nobody should be blocking in debugfs files, like we did
> > in ours, but also smsdvb_stats_read(), crtc_crc_open() look like
> > they could block for quite a while.
> 
> Blocking in the debugfs files' fops shall be fine by itself, that's
> why SRCU is used for the removal stuff.

No, it isn't fine at all now! If I have a debugfs file - like the one I
had - that could block for external events (firmware, in my case), then
*any* other debugfs_remove() in the whole system would also block
indefinitely. That's a major problem! The two other files listed above
can also block waiting for external events, afaict. I'm told that there
are more files in other wireless drivers too.

Basically, even with SRCU being used, you cannot have blocking files.
You have to treat everything as O_NONBLOCK, because if you don't a
completely unrelated debugfs_remove() will block until the file
produces data. IMHO that's completely unacceptable.

> Yes, there's only one global srcu_struct for debugfs. So far this
> hasn't been a problem and if I understand things correctly, it's also
> not the problem at hand? If it really becomes an issue, we can very
> well introduce per directory srcu_structs as you suggested.

No, this is exactly the problem. If I have one blocking file, and
remove any completely unrelated file elsewhere in the system, I need to
wait for the blocking file to have produced data. That just doesn't
scale.

Using a separate SRCU subsystem per directory will go some way, but it
would still be useful to have lockdep annotations there.

Ultimately, I'm not sure I see why one couldn't just have a
reader/writer lock per *file*, which would be the ultimate granularity
to solve this. Obviously, a blocking file has to be aborted before
being removed itself, but there's nothing that says that you can't
remove any other file - even from the same directory - while this one
is in a blocking read.

> Let me have a look
> - whether holding the RTNL lock while removing the debugfs files is
>   actually needed and
> - whether there is an easy way to spot similar scenarios and emit
>   a warning for them.
> 
> If this doesn't solve the problem, I'll have to think of a different
> way to fix this...

It solves - imho with unnecessary hardship on the caller of
debugfs_remove() - only half of the problem, namely the real deadlock.
It does nothing for the "blocking debugfs file" live-lock where forward
progress can be made as soon as the file has some data available -
which in practice in my scenario never happened as the data producer
was completely idle.

> > (*) before removing first first we'd obviously wake up and thereby
> > more or less terminate the readers first
> 
> With the current implementation, I can't see an easy way to identify
> the tasks blocking on a particular debugfs file. But maybe this is
> resolvable and the way to go here...

No, this is unrelated. If I write a blocking debugfs file, then *of
course* I need to take care that before remove it I wake up all the
readers. But that's easy because I control how that file produces data,
so I can wake up the waitq and tell my own code inside the debugfs read
to return 0 (EOF).

This would be entirely inappropriate as a general solution because
you'd have to kill the userspace process or something...

johannes

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-30 Thread Johannes Berg

On Thu, 2017-03-30 at 09:32 +0200, Nicolai Stange wrote:
> 
> I wonder if holding the RTNL during the debugfs file removal is
> really needed. I'll try to have a look in the next couple of days.

Yes, I'm pretty much convinced that it is. I considered doing a
deferred debugfs_remove() by holding the object around, but then I
can't be sure that I can later re-add a new object with the same
directory name, so I have much more complexity - I'm not even sure that
can be solved at all, *perhaps* by renaming in debugfs first, but
that's major new complexity.

Enough complexity that I'm considering just removing debugfs usage
entirely and invent new mechanisms, or use sysfs, or something else
instead.

> Summarizing, the problem is the call to the indefinitely blocking
> srcu_synchronize() while having a lock held? I'll see whether I can
> ask lockdep if any lock is held and spit out a warning then.

Half the thread here was about that - it's not easily doable because
you'd have to teach lockdep about the special SRCU semantics first.
Since it doesn't even seem to do read/write locks properly that's
probably a massive undertaking.

I also doubt that it's useful, because even if we did flag this sort of
situation, it can occur across very different drivers - for example the
netronome driver using rtnl_lock() inside its debugfs files, and
mac80211 removing a completely unrelated debugfs file within
rtnl_lock().

> 
> > Similarly, nobody should be blocking in debugfs files, like we did
> > in ours, but also smsdvb_stats_read(), crtc_crc_open() look like
> > they could block for quite a while.
> 
> Blocking in the debugfs files' fops shall be fine by itself, that's
> why SRCU is used for the removal stuff.

No, it isn't fine at all now! If I have a debugfs file - like the one I
had - that could block for external events (firmware, in my case), then
*any* other debugfs_remove() in the whole system would also block
indefinitely. That's a major problem! The two other files listed above
can also block waiting for external events, afaict. I'm told that there
are more files in other wireless drivers too.

Basically, even with SRCU being used, you cannot have blocking files.
You have to treat everything as O_NONBLOCK, because if you don't a
completely unrelated debugfs_remove() will block until the file
produces data. IMHO that's completely unacceptable.

> Yes, there's only one global srcu_struct for debugfs. So far this
> hasn't been a problem and if I understand things correctly, it's also
> not the problem at hand? If it really becomes an issue, we can very
> well introduce per directory srcu_structs as you suggested.

No, this is exactly the problem. If I have one blocking file, and
remove any completely unrelated file elsewhere in the system, I need to
wait for the blocking file to have produced data. That just doesn't
scale.

Using a separate SRCU subsystem per directory will go some way, but it
would still be useful to have lockdep annotations there.

Ultimately, I'm not sure I see why one couldn't just have a
reader/writer lock per *file*, which would be the ultimate granularity
to solve this. Obviously, a blocking file has to be aborted before
being removed itself, but there's nothing that says that you can't
remove any other file - even from the same directory - while this one
is in a blocking read.

> Let me have a look
> - whether holding the RTNL lock while removing the debugfs files is
>   actually needed and
> - whether there is an easy way to spot similar scenarios and emit
>   a warning for them.
> 
> If this doesn't solve the problem, I'll have to think of a different
> way to fix this...

It solves - imho with unnecessary hardship on the caller of
debugfs_remove() - only half of the problem, namely the real deadlock.
It does nothing for the "blocking debugfs file" live-lock where forward
progress can be made as soon as the file has some data available -
which in practice in my scenario never happened as the data producer
was completely idle.

> > (*) before removing first first we'd obviously wake up and thereby
> > more or less terminate the readers first
> 
> With the current implementation, I can't see an easy way to identify
> the tasks blocking on a particular debugfs file. But maybe this is
> resolvable and the way to go here...

No, this is unrelated. If I write a blocking debugfs file, then *of
course* I need to take care that before remove it I wake up all the
readers. But that's easy because I control how that file produces data,
so I can wake up the waitq and tell my own code inside the debugfs read
to return 0 (EOF).

This would be entirely inappropriate as a general solution because
you'd have to kill the userspace process or something...

johannes

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-30 Thread Nicolai Stange

Hi Johannes,

On Mon, Mar 27 2017, Johannes Berg wrote:

>> > Before I go hunting - has anyone seen a deadlock in
>> > synchronize_srcu() in debugfs_remove() before?
>> 
>> Not yet. How reproducible is this?
>
> So ... this turned out to be a livelock of sorts.
>
> We have a debugfs file (not upstream (yet?), it seems) that basically
> blocks reading data.
>
> At the point of system hanging, there was a process reading from that
> file, with no data being generated.

>
> A second process was trying to remove a completely unrelated debugfs
> file (*), with the RTNL held.

I wonder if holding the RTNL during the debugfs file removal is really
needed. I'll try to have a look in the next couple of days.

>
> A third and many other processes were waiting to acquire the RTNL.
>
>
> Obviously, in light of things like nfp_net_debugfs_tx_q_read(),
> wil_write_file_reset(), lowpan_short_addr_get() and quite a few more,
> nobody in the whole system can now remove debugfs files while holding
> the RTNL. Not sure how many people that affects, but it's IMHO a pretty
> major new restriction, and one that isn't even flagged at all.

To be honest, I didn't have this scenario, i.e. removing a debugfs file
under a lock, in mind when writing this removal protection series.

Thank you very much for your debugging work and for pointing me to this
sort of problem!

Summarizing, the problem is the call to the indefinitely blocking
srcu_synchronize() while having a lock held? I'll see whether I can ask
lockdep if any lock is held and spit out a warning then.

> Similarly, nobody should be blocking in debugfs files, like we did in
> ours, but also smsdvb_stats_read(), crtc_crc_open() look like they
> could block for quite a while.

Blocking in the debugfs files' fops shall be fine by itself, that's why
SRCU is used for the removal stuff.

> Again, there's no warning here that blocking in debugfs files can now
> indefinitely defer completely unrelated debugfs_remove() calls in the
> entire system.

Yes, there's only one global srcu_struct for debugfs. So far this hasn't
been a problem and if I understand things correctly, it's also not the
problem at hand? If it really becomes an issue, we can very well
introduce per directory srcu_structs as you suggested.

> Overall, while I can solve this problem for our driver, possibly by
> making the debugfs file return some dummy data periodically if no real
> data exists, which may not easily be possible for all such files, I'm
> not convinced that all of this really is the right thing to actually
> impose.

No, I agree: imposing dummy data reads certainly isn't.

Let me have a look
- whether holding the RTNL lock while removing the debugfs files is
  actually needed and
- whether there is an easy way to spot similar scenarios and emit
  a warning for them.

If this doesn't solve the problem, I'll have to think of a different way
to fix this...

> (*) before removing first first we'd obviously wake up and thereby more
> or less terminate the readers first

With the current implementation, I can't see an easy way to identify the
tasks blocking on a particular debugfs file. But maybe this is
resolvable and the way to go here...

Thanks,

Nicolai

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-30 Thread Nicolai Stange

Hi Johannes,

On Mon, Mar 27 2017, Johannes Berg wrote:

>> > Before I go hunting - has anyone seen a deadlock in
>> > synchronize_srcu() in debugfs_remove() before?
>> 
>> Not yet. How reproducible is this?
>
> So ... this turned out to be a livelock of sorts.
>
> We have a debugfs file (not upstream (yet?), it seems) that basically
> blocks reading data.
>
> At the point of system hanging, there was a process reading from that
> file, with no data being generated.

>
> A second process was trying to remove a completely unrelated debugfs
> file (*), with the RTNL held.

I wonder if holding the RTNL during the debugfs file removal is really
needed. I'll try to have a look in the next couple of days.

>
> A third and many other processes were waiting to acquire the RTNL.
>
>
> Obviously, in light of things like nfp_net_debugfs_tx_q_read(),
> wil_write_file_reset(), lowpan_short_addr_get() and quite a few more,
> nobody in the whole system can now remove debugfs files while holding
> the RTNL. Not sure how many people that affects, but it's IMHO a pretty
> major new restriction, and one that isn't even flagged at all.

To be honest, I didn't have this scenario, i.e. removing a debugfs file
under a lock, in mind when writing this removal protection series.

Thank you very much for your debugging work and for pointing me to this
sort of problem!

Summarizing, the problem is the call to the indefinitely blocking
srcu_synchronize() while having a lock held? I'll see whether I can ask
lockdep if any lock is held and spit out a warning then.

> Similarly, nobody should be blocking in debugfs files, like we did in
> ours, but also smsdvb_stats_read(), crtc_crc_open() look like they
> could block for quite a while.

Blocking in the debugfs files' fops shall be fine by itself, that's why
SRCU is used for the removal stuff.

> Again, there's no warning here that blocking in debugfs files can now
> indefinitely defer completely unrelated debugfs_remove() calls in the
> entire system.

Yes, there's only one global srcu_struct for debugfs. So far this hasn't
been a problem and if I understand things correctly, it's also not the
problem at hand? If it really becomes an issue, we can very well
introduce per directory srcu_structs as you suggested.

> Overall, while I can solve this problem for our driver, possibly by
> making the debugfs file return some dummy data periodically if no real
> data exists, which may not easily be possible for all such files, I'm
> not convinced that all of this really is the right thing to actually
> impose.

No, I agree: imposing dummy data reads certainly isn't.

Let me have a look
- whether holding the RTNL lock while removing the debugfs files is
  actually needed and
- whether there is an easy way to spot similar scenarios and emit
  a warning for them.

If this doesn't solve the problem, I'll have to think of a different way
to fix this...

> (*) before removing first first we'd obviously wake up and thereby more
> or less terminate the readers first

With the current implementation, I can't see an easy way to identify the
tasks blocking on a particular debugfs file. But maybe this is
resolvable and the way to go here...

Thanks,

Nicolai

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-27 Thread Johannes Berg

Hi,

> > Before I go hunting - has anyone seen a deadlock in
> > synchronize_srcu() in debugfs_remove() before?
> 
> Not yet. How reproducible is this?

So ... this turned out to be a livelock of sorts.

We have a debugfs file (not upstream (yet?), it seems) that basically
blocks reading data.

At the point of system hanging, there was a process reading from that
file, with no data being generated.

A second process was trying to remove a completely unrelated debugfs
file (*), with the RTNL held.

A third and many other processes were waiting to acquire the RTNL.


Obviously, in light of things like nfp_net_debugfs_tx_q_read(),
wil_write_file_reset(), lowpan_short_addr_get() and quite a few more,
nobody in the whole system can now remove debugfs files while holding
the RTNL. Not sure how many people that affects, but it's IMHO a pretty
major new restriction, and one that isn't even flagged at all.


Similarly, nobody should be blocking in debugfs files, like we did in
ours, but also smsdvb_stats_read(), crtc_crc_open() look like they
could block for quite a while. Again, there's no warning here that
blocking in debugfs files can now indefinitely defer completely
unrelated debugfs_remove() calls in the entire system.

Overall, while I can solve this problem for our driver, possibly by
making the debugfs file return some dummy data periodically if no real
data exists, which may not easily be possible for all such files, I'm
not convinced that all of this really is the right thing to actually
impose. Perhaps if it was per directory, or per some kind of subsystem?

johannes

(*) before removing first first we'd obviously wake up and thereby more
or less terminate the readers first

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-27 Thread Johannes Berg

Hi,

> > Before I go hunting - has anyone seen a deadlock in
> > synchronize_srcu() in debugfs_remove() before?
> 
> Not yet. How reproducible is this?

So ... this turned out to be a livelock of sorts.

We have a debugfs file (not upstream (yet?), it seems) that basically
blocks reading data.

At the point of system hanging, there was a process reading from that
file, with no data being generated.

A second process was trying to remove a completely unrelated debugfs
file (*), with the RTNL held.

A third and many other processes were waiting to acquire the RTNL.


Obviously, in light of things like nfp_net_debugfs_tx_q_read(),
wil_write_file_reset(), lowpan_short_addr_get() and quite a few more,
nobody in the whole system can now remove debugfs files while holding
the RTNL. Not sure how many people that affects, but it's IMHO a pretty
major new restriction, and one that isn't even flagged at all.


Similarly, nobody should be blocking in debugfs files, like we did in
ours, but also smsdvb_stats_read(), crtc_crc_open() look like they
could block for quite a while. Again, there's no warning here that
blocking in debugfs files can now indefinitely defer completely
unrelated debugfs_remove() calls in the entire system.

Overall, while I can solve this problem for our driver, possibly by
making the debugfs file return some dummy data periodically if no real
data exists, which may not easily be possible for all such files, I'm
not convinced that all of this really is the right thing to actually
impose. Perhaps if it was per directory, or per some kind of subsystem?

johannes

(*) before removing first first we'd obviously wake up and thereby more
or less terminate the readers first

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-27 Thread Johannes Berg

On Fri, 2017-03-24 at 13:20 -0700, Paul E. McKenney wrote:
> 
> And I cannot resist adding this one:
> 
>   CPU 1   CPU 2
>   i = srcu_read_lock();mutex_lock();
>   mutex_lock();synchronize_srcu();
>   mutex_unlock();  mutex_unlock();
>   srcu_read_unlock(, i);
> 
>   CPU 3   CPU 4
>   i = srcu_read_lock();mutex_lock();
>   mutex_lock();synchronize_srcu();
>   mutex_unlock();  mutex_unlock();
>   srcu_read_unlock(, i);
> 
> Removing the SRCU statements from any of these CPU would break the
> deadlock.  This can be easily extended to a deadlock cycle involving
> any number of srcu_struct structures.
> 
> But this would still be a cycle involving an srcu_read_lock() and a
> synchronize_srcu() on the same srcu_struct, which is reassuring.

Right, you can cycle this indefinitely. lockdep has some kind of
maximum chain length I think. :)

johannes

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-27 Thread Johannes Berg

On Fri, 2017-03-24 at 13:20 -0700, Paul E. McKenney wrote:
> 
> And I cannot resist adding this one:
> 
>   CPU 1   CPU 2
>   i = srcu_read_lock();mutex_lock();
>   mutex_lock();synchronize_srcu();
>   mutex_unlock();  mutex_unlock();
>   srcu_read_unlock(, i);
> 
>   CPU 3   CPU 4
>   i = srcu_read_lock();mutex_lock();
>   mutex_lock();synchronize_srcu();
>   mutex_unlock();  mutex_unlock();
>   srcu_read_unlock(, i);
> 
> Removing the SRCU statements from any of these CPU would break the
> deadlock.  This can be easily extended to a deadlock cycle involving
> any number of srcu_struct structures.
> 
> But this would still be a cycle involving an srcu_read_lock() and a
> synchronize_srcu() on the same srcu_struct, which is reassuring.

Right, you can cycle this indefinitely. lockdep has some kind of
maximum chain length I think. :)

johannes

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Paul E. McKenney

On Fri, Mar 24, 2017 at 12:33:22PM -0700, Paul E. McKenney wrote:
> On Fri, Mar 24, 2017 at 07:51:47PM +0100, Johannes Berg wrote:
> > 
> > > Yes.  CPU2 has a pre-existing reader that CPU1's synchronize_srcu()
> > > must wait for.  But CPU2's reader cannot end until CPU1 releases
> > > its lock, which it cannot do until after CPU2's reader ends.  Thus,
> > > as you say, deadlock.
> > > 
> > > The rule is that if you are within any kind of RCU read-side critical
> > > section, you cannot directly or indirectly wait for a grace period
> > > from that same RCU flavor.
> > 
> > Right. This is indirect then, in a way.
> 
> Agreed, in a way.  ;-)
> 
> > > There are some challenges, though.  This is OK:
> > > 
> > >   CPU1CPU2
> > >   i = srcu_read_lock();mutex_lock(_lock);
> > >   mutex_lock(_lock);   i = srcu_read_lock();
> > >   srcu_read_unlock(, i);   mutex_unlock(_lock);
> > >   mutex_unlock(_lock); srcu_read_unlock(, i);
> > > 
> > >   CPU3
> > >   synchronize_srcu();
> > > 
> > > This could be a deadlock for reader-writer locking, but not for SRCU.
> > 
> > Hmm, yes, that's a good point. If srcu_read_lock() was read_lock, and
> > synchronize_srcu() was write_lock(), then the write_lock() could stop
> > CPU2's read_lock() from acquiring the lock, and thus cause a deadlock.
> 
> Yes.
> 
> > However, I'm not convinced that lockdep handles reader/writer locks
> > correctly to start with, right now, since it *didn't* actually trigger
> > any warnings when I annotated SRCU as a reader/writer lock.
> 
> I haven't looked into lockdep enough to know either way.
> 
> > > This is also OK:
> > >   CPU1CPU2
> > >   i = srcu_read_lock();mutex_lock(_lock);
> > >   mutex_lock(_lock);   synchronize_srcu(
> > u);
> > >   srcu_read_unlock(, i);   mutex_unlock(_lock);
> > >   mutex_unlock(_lock);
> > > 
> > > Here CPU1's read-side critical sections are for mysrcu, which is
> > > independent of CPU2's grace period for yoursrcu.
> > 
> > Right, but that's already covered by having separate a lockdep_map for
> > each SRCU subsystem (mysrcu, yoursrcu).
> 
> I hope so, but haven't proved that this would work in all possible cases.
> 
> > > So you could flag any lockdep cycle that contained a reader and a
> > > synchronous grace period for the same flavor of RCU, where for SRCU
> > > the identity of the srcu_struct structure is part of the flavor.
> > 
> > Right. Basically, I think SRCU should be like a reader/writer lock
> > (perhaps fixed to work right). The only difference seems to be the
> > scenario you outlined above (first of the two)?
> > 
> > Actually, given the scenario above, for lockdep purposes the
> > reader/writer lock is actually the same as a recursive lock, I guess?
> 
> Except that a recursive reader/writer lock can still have deadlocks
> involving the outermost reader that would not be deadlocks for the
> equivalent SRCU scenarios.
> 
> > You outlined a scenario in which the reader gets blocked due to a
> > writer (CPU3 doing a write_lock()) so the reader can still participate
> > in a deadlock cycle since it can - without any other locks being held
> > by CPU3 that participate - cause a deadlock between CPU1 and CPU2 here.
> > For lockdep then, even seeing the CPU1 and CPU2 scenarios should be
> > sufficient to flag a deadlock (*).
> 
> Might this be one of the reasons why lockdep has problems with
> reader-writer locks?
> 
> > This part then isn't true for SRCU, because there forward progress will
> > still be made. So for SRCU, the "reader" side really needs to be
> > connected with a "writer" side to form a deadlock cycle, unlike for a
> > reader/writer lock.
> 
> Yes, for SRCU, srcu_read_lock() itself never blocks, so it never
> participates directly in a deadlock cycle.  It has to be the case
> that something within the SRCU read-side critical section blocks
> and takes its place in the deadlock cycle.
> 
> Then again, if you didn't have something blocking within your SRCU
> read-side critical section, why would you be using SRCU instead of
> just plain RCU?  ;-)
> 
> > johannes
> > 
> > (*) technically only after checking that write_lock() is ever used, but
> > ... seems reasonable enough to assume that it will be used, since why
> > would anyone ever use a reader/writer lock if there are only readers?
> > That's a no-op.
> 
> Makes sense to me!  The only reasons I can come up with are things like
> shutting lockdep up when it wants a given lock read-held or some such.

And I cannot resist adding this one:

CPU 1   CPU 2
i = srcu_read_lock();mutex_lock();
mutex_lock();synchronize_srcu();
mutex_unlock();  mutex_unlock();
srcu_read_unlock(, i);

CPU 3   CPU 4
i = srcu_read_lock();mutex_lock();
mutex_lock();synchronize_srcu();
mutex_unlock();

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Paul E. McKenney

On Fri, Mar 24, 2017 at 12:33:22PM -0700, Paul E. McKenney wrote:
> On Fri, Mar 24, 2017 at 07:51:47PM +0100, Johannes Berg wrote:
> > 
> > > Yes.  CPU2 has a pre-existing reader that CPU1's synchronize_srcu()
> > > must wait for.  But CPU2's reader cannot end until CPU1 releases
> > > its lock, which it cannot do until after CPU2's reader ends.  Thus,
> > > as you say, deadlock.
> > > 
> > > The rule is that if you are within any kind of RCU read-side critical
> > > section, you cannot directly or indirectly wait for a grace period
> > > from that same RCU flavor.
> > 
> > Right. This is indirect then, in a way.
> 
> Agreed, in a way.  ;-)
> 
> > > There are some challenges, though.  This is OK:
> > > 
> > >   CPU1CPU2
> > >   i = srcu_read_lock();mutex_lock(_lock);
> > >   mutex_lock(_lock);   i = srcu_read_lock();
> > >   srcu_read_unlock(, i);   mutex_unlock(_lock);
> > >   mutex_unlock(_lock); srcu_read_unlock(, i);
> > > 
> > >   CPU3
> > >   synchronize_srcu();
> > > 
> > > This could be a deadlock for reader-writer locking, but not for SRCU.
> > 
> > Hmm, yes, that's a good point. If srcu_read_lock() was read_lock, and
> > synchronize_srcu() was write_lock(), then the write_lock() could stop
> > CPU2's read_lock() from acquiring the lock, and thus cause a deadlock.
> 
> Yes.
> 
> > However, I'm not convinced that lockdep handles reader/writer locks
> > correctly to start with, right now, since it *didn't* actually trigger
> > any warnings when I annotated SRCU as a reader/writer lock.
> 
> I haven't looked into lockdep enough to know either way.
> 
> > > This is also OK:
> > >   CPU1CPU2
> > >   i = srcu_read_lock();mutex_lock(_lock);
> > >   mutex_lock(_lock);   synchronize_srcu(
> > u);
> > >   srcu_read_unlock(, i);   mutex_unlock(_lock);
> > >   mutex_unlock(_lock);
> > > 
> > > Here CPU1's read-side critical sections are for mysrcu, which is
> > > independent of CPU2's grace period for yoursrcu.
> > 
> > Right, but that's already covered by having separate a lockdep_map for
> > each SRCU subsystem (mysrcu, yoursrcu).
> 
> I hope so, but haven't proved that this would work in all possible cases.
> 
> > > So you could flag any lockdep cycle that contained a reader and a
> > > synchronous grace period for the same flavor of RCU, where for SRCU
> > > the identity of the srcu_struct structure is part of the flavor.
> > 
> > Right. Basically, I think SRCU should be like a reader/writer lock
> > (perhaps fixed to work right). The only difference seems to be the
> > scenario you outlined above (first of the two)?
> > 
> > Actually, given the scenario above, for lockdep purposes the
> > reader/writer lock is actually the same as a recursive lock, I guess?
> 
> Except that a recursive reader/writer lock can still have deadlocks
> involving the outermost reader that would not be deadlocks for the
> equivalent SRCU scenarios.
> 
> > You outlined a scenario in which the reader gets blocked due to a
> > writer (CPU3 doing a write_lock()) so the reader can still participate
> > in a deadlock cycle since it can - without any other locks being held
> > by CPU3 that participate - cause a deadlock between CPU1 and CPU2 here.
> > For lockdep then, even seeing the CPU1 and CPU2 scenarios should be
> > sufficient to flag a deadlock (*).
> 
> Might this be one of the reasons why lockdep has problems with
> reader-writer locks?
> 
> > This part then isn't true for SRCU, because there forward progress will
> > still be made. So for SRCU, the "reader" side really needs to be
> > connected with a "writer" side to form a deadlock cycle, unlike for a
> > reader/writer lock.
> 
> Yes, for SRCU, srcu_read_lock() itself never blocks, so it never
> participates directly in a deadlock cycle.  It has to be the case
> that something within the SRCU read-side critical section blocks
> and takes its place in the deadlock cycle.
> 
> Then again, if you didn't have something blocking within your SRCU
> read-side critical section, why would you be using SRCU instead of
> just plain RCU?  ;-)
> 
> > johannes
> > 
> > (*) technically only after checking that write_lock() is ever used, but
> > ... seems reasonable enough to assume that it will be used, since why
> > would anyone ever use a reader/writer lock if there are only readers?
> > That's a no-op.
> 
> Makes sense to me!  The only reasons I can come up with are things like
> shutting lockdep up when it wants a given lock read-held or some such.

And I cannot resist adding this one:

CPU 1   CPU 2
i = srcu_read_lock();mutex_lock();
mutex_lock();synchronize_srcu();
mutex_unlock();  mutex_unlock();
srcu_read_unlock(, i);

CPU 3   CPU 4
i = srcu_read_lock();mutex_lock();
mutex_lock();synchronize_srcu();
mutex_unlock();

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Paul E. McKenney

On Fri, Mar 24, 2017 at 07:51:47PM +0100, Johannes Berg wrote:
> 
> > Yes.  CPU2 has a pre-existing reader that CPU1's synchronize_srcu()
> > must wait for.  But CPU2's reader cannot end until CPU1 releases
> > its lock, which it cannot do until after CPU2's reader ends.  Thus,
> > as you say, deadlock.
> > 
> > The rule is that if you are within any kind of RCU read-side critical
> > section, you cannot directly or indirectly wait for a grace period
> > from that same RCU flavor.
> 
> Right. This is indirect then, in a way.

Agreed, in a way.  ;-)

> > There are some challenges, though.  This is OK:
> > 
> > CPU1CPU2
> > i = srcu_read_lock();mutex_lock(_lock);
> > mutex_lock(_lock);   i = srcu_read_lock();
> > srcu_read_unlock(, i);   mutex_unlock(_lock);
> > mutex_unlock(_lock); srcu_read_unlock(, i);
> > 
> > CPU3
> > synchronize_srcu();
> > 
> > This could be a deadlock for reader-writer locking, but not for SRCU.
> 
> Hmm, yes, that's a good point. If srcu_read_lock() was read_lock, and
> synchronize_srcu() was write_lock(), then the write_lock() could stop
> CPU2's read_lock() from acquiring the lock, and thus cause a deadlock.

Yes.

> However, I'm not convinced that lockdep handles reader/writer locks
> correctly to start with, right now, since it *didn't* actually trigger
> any warnings when I annotated SRCU as a reader/writer lock.

I haven't looked into lockdep enough to know either way.

> > This is also OK:
> > CPU1CPU2
> > i = srcu_read_lock();mutex_lock(_lock);
> > mutex_lock(_lock);   synchronize_srcu(
> u);
> > srcu_read_unlock(, i);   mutex_unlock(_lock);
> > mutex_unlock(_lock);
> > 
> > Here CPU1's read-side critical sections are for mysrcu, which is
> > independent of CPU2's grace period for yoursrcu.
> 
> Right, but that's already covered by having separate a lockdep_map for
> each SRCU subsystem (mysrcu, yoursrcu).

I hope so, but haven't proved that this would work in all possible cases.

> > So you could flag any lockdep cycle that contained a reader and a
> > synchronous grace period for the same flavor of RCU, where for SRCU
> > the identity of the srcu_struct structure is part of the flavor.
> 
> Right. Basically, I think SRCU should be like a reader/writer lock
> (perhaps fixed to work right). The only difference seems to be the
> scenario you outlined above (first of the two)?
> 
> Actually, given the scenario above, for lockdep purposes the
> reader/writer lock is actually the same as a recursive lock, I guess?

Except that a recursive reader/writer lock can still have deadlocks
involving the outermost reader that would not be deadlocks for the
equivalent SRCU scenarios.

> You outlined a scenario in which the reader gets blocked due to a
> writer (CPU3 doing a write_lock()) so the reader can still participate
> in a deadlock cycle since it can - without any other locks being held
> by CPU3 that participate - cause a deadlock between CPU1 and CPU2 here.
> For lockdep then, even seeing the CPU1 and CPU2 scenarios should be
> sufficient to flag a deadlock (*).

Might this be one of the reasons why lockdep has problems with
reader-writer locks?

> This part then isn't true for SRCU, because there forward progress will
> still be made. So for SRCU, the "reader" side really needs to be
> connected with a "writer" side to form a deadlock cycle, unlike for a
> reader/writer lock.

Yes, for SRCU, srcu_read_lock() itself never blocks, so it never
participates directly in a deadlock cycle.  It has to be the case
that something within the SRCU read-side critical section blocks
and takes its place in the deadlock cycle.

Then again, if you didn't have something blocking within your SRCU
read-side critical section, why would you be using SRCU instead of
just plain RCU?  ;-)

> johannes
> 
> (*) technically only after checking that write_lock() is ever used, but
> ... seems reasonable enough to assume that it will be used, since why
> would anyone ever use a reader/writer lock if there are only readers?
> That's a no-op.

Makes sense to me!  The only reasons I can come up with are things like
shutting lockdep up when it wants a given lock read-held or some such.

Thanx, Paul

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Paul E. McKenney

On Fri, Mar 24, 2017 at 07:51:47PM +0100, Johannes Berg wrote:
> 
> > Yes.  CPU2 has a pre-existing reader that CPU1's synchronize_srcu()
> > must wait for.  But CPU2's reader cannot end until CPU1 releases
> > its lock, which it cannot do until after CPU2's reader ends.  Thus,
> > as you say, deadlock.
> > 
> > The rule is that if you are within any kind of RCU read-side critical
> > section, you cannot directly or indirectly wait for a grace period
> > from that same RCU flavor.
> 
> Right. This is indirect then, in a way.

Agreed, in a way.  ;-)

> > There are some challenges, though.  This is OK:
> > 
> > CPU1CPU2
> > i = srcu_read_lock();mutex_lock(_lock);
> > mutex_lock(_lock);   i = srcu_read_lock();
> > srcu_read_unlock(, i);   mutex_unlock(_lock);
> > mutex_unlock(_lock); srcu_read_unlock(, i);
> > 
> > CPU3
> > synchronize_srcu();
> > 
> > This could be a deadlock for reader-writer locking, but not for SRCU.
> 
> Hmm, yes, that's a good point. If srcu_read_lock() was read_lock, and
> synchronize_srcu() was write_lock(), then the write_lock() could stop
> CPU2's read_lock() from acquiring the lock, and thus cause a deadlock.

Yes.

> However, I'm not convinced that lockdep handles reader/writer locks
> correctly to start with, right now, since it *didn't* actually trigger
> any warnings when I annotated SRCU as a reader/writer lock.

I haven't looked into lockdep enough to know either way.

> > This is also OK:
> > CPU1CPU2
> > i = srcu_read_lock();mutex_lock(_lock);
> > mutex_lock(_lock);   synchronize_srcu(
> u);
> > srcu_read_unlock(, i);   mutex_unlock(_lock);
> > mutex_unlock(_lock);
> > 
> > Here CPU1's read-side critical sections are for mysrcu, which is
> > independent of CPU2's grace period for yoursrcu.
> 
> Right, but that's already covered by having separate a lockdep_map for
> each SRCU subsystem (mysrcu, yoursrcu).

I hope so, but haven't proved that this would work in all possible cases.

> > So you could flag any lockdep cycle that contained a reader and a
> > synchronous grace period for the same flavor of RCU, where for SRCU
> > the identity of the srcu_struct structure is part of the flavor.
> 
> Right. Basically, I think SRCU should be like a reader/writer lock
> (perhaps fixed to work right). The only difference seems to be the
> scenario you outlined above (first of the two)?
> 
> Actually, given the scenario above, for lockdep purposes the
> reader/writer lock is actually the same as a recursive lock, I guess?

Except that a recursive reader/writer lock can still have deadlocks
involving the outermost reader that would not be deadlocks for the
equivalent SRCU scenarios.

> You outlined a scenario in which the reader gets blocked due to a
> writer (CPU3 doing a write_lock()) so the reader can still participate
> in a deadlock cycle since it can - without any other locks being held
> by CPU3 that participate - cause a deadlock between CPU1 and CPU2 here.
> For lockdep then, even seeing the CPU1 and CPU2 scenarios should be
> sufficient to flag a deadlock (*).

Might this be one of the reasons why lockdep has problems with
reader-writer locks?

> This part then isn't true for SRCU, because there forward progress will
> still be made. So for SRCU, the "reader" side really needs to be
> connected with a "writer" side to form a deadlock cycle, unlike for a
> reader/writer lock.

Yes, for SRCU, srcu_read_lock() itself never blocks, so it never
participates directly in a deadlock cycle.  It has to be the case
that something within the SRCU read-side critical section blocks
and takes its place in the deadlock cycle.

Then again, if you didn't have something blocking within your SRCU
read-side critical section, why would you be using SRCU instead of
just plain RCU?  ;-)

> johannes
> 
> (*) technically only after checking that write_lock() is ever used, but
> ... seems reasonable enough to assume that it will be used, since why
> would anyone ever use a reader/writer lock if there are only readers?
> That's a no-op.

Makes sense to me!  The only reasons I can come up with are things like
shutting lockdep up when it wants a given lock read-held or some such.

Thanx, Paul

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Johannes Berg


> Yes.  CPU2 has a pre-existing reader that CPU1's synchronize_srcu()
> must wait for.  But CPU2's reader cannot end until CPU1 releases
> its lock, which it cannot do until after CPU2's reader ends.  Thus,
> as you say, deadlock.
> 
> The rule is that if you are within any kind of RCU read-side critical
> section, you cannot directly or indirectly wait for a grace period
> from that same RCU flavor.

Right. This is indirect then, in a way.

> There are some challenges, though.  This is OK:
> 
>   CPU1CPU2
>   i = srcu_read_lock();mutex_lock(_lock);
>   mutex_lock(_lock);   i = srcu_read_lock();
>   srcu_read_unlock(, i);   mutex_unlock(_lock);
>   mutex_unlock(_lock); srcu_read_unlock(, i);
> 
>   CPU3
>   synchronize_srcu();
> 
> This could be a deadlock for reader-writer locking, but not for SRCU.

Hmm, yes, that's a good point. If srcu_read_lock() was read_lock, and
synchronize_srcu() was write_lock(), then the write_lock() could stop
CPU2's read_lock() from acquiring the lock, and thus cause a deadlock.

However, I'm not convinced that lockdep handles reader/writer locks
correctly to start with, right now, since it *didn't* actually trigger
any warnings when I annotated SRCU as a reader/writer lock.

> This is also OK:
>   CPU1CPU2
>   i = srcu_read_lock();mutex_lock(_lock);
>   mutex_lock(_lock);   synchronize_srcu(
u);
>   srcu_read_unlock(, i);   mutex_unlock(_lock);
>   mutex_unlock(_lock);
> 
> Here CPU1's read-side critical sections are for mysrcu, which is
> independent of CPU2's grace period for yoursrcu.

Right, but that's already covered by having separate a lockdep_map for
each SRCU subsystem (mysrcu, yoursrcu).

> So you could flag any lockdep cycle that contained a reader and a
> synchronous grace period for the same flavor of RCU, where for SRCU
> the identity of the srcu_struct structure is part of the flavor.

Right. Basically, I think SRCU should be like a reader/writer lock
(perhaps fixed to work right). The only difference seems to be the
scenario you outlined above (first of the two)?

Actually, given the scenario above, for lockdep purposes the
reader/writer lock is actually the same as a recursive lock, I guess?

You outlined a scenario in which the reader gets blocked due to a
writer (CPU3 doing a write_lock()) so the reader can still participate
in a deadlock cycle since it can - without any other locks being held
by CPU3 that participate - cause a deadlock between CPU1 and CPU2 here.
For lockdep then, even seeing the CPU1 and CPU2 scenarios should be
sufficient to flag a deadlock (*).

This part then isn't true for SRCU, because there forward progress will
still be made. So for SRCU, the "reader" side really needs to be
connected with a "writer" side to form a deadlock cycle, unlike for a
reader/writer lock.

johannes

(*) technically only after checking that write_lock() is ever used, but
... seems reasonable enough to assume that it will be used, since why
would anyone ever use a reader/writer lock if there are only readers?
That's a no-op.

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Johannes Berg


> Yes.  CPU2 has a pre-existing reader that CPU1's synchronize_srcu()
> must wait for.  But CPU2's reader cannot end until CPU1 releases
> its lock, which it cannot do until after CPU2's reader ends.  Thus,
> as you say, deadlock.
> 
> The rule is that if you are within any kind of RCU read-side critical
> section, you cannot directly or indirectly wait for a grace period
> from that same RCU flavor.

Right. This is indirect then, in a way.

> There are some challenges, though.  This is OK:
> 
>   CPU1CPU2
>   i = srcu_read_lock();mutex_lock(_lock);
>   mutex_lock(_lock);   i = srcu_read_lock();
>   srcu_read_unlock(, i);   mutex_unlock(_lock);
>   mutex_unlock(_lock); srcu_read_unlock(, i);
> 
>   CPU3
>   synchronize_srcu();
> 
> This could be a deadlock for reader-writer locking, but not for SRCU.

Hmm, yes, that's a good point. If srcu_read_lock() was read_lock, and
synchronize_srcu() was write_lock(), then the write_lock() could stop
CPU2's read_lock() from acquiring the lock, and thus cause a deadlock.

However, I'm not convinced that lockdep handles reader/writer locks
correctly to start with, right now, since it *didn't* actually trigger
any warnings when I annotated SRCU as a reader/writer lock.

> This is also OK:
>   CPU1CPU2
>   i = srcu_read_lock();mutex_lock(_lock);
>   mutex_lock(_lock);   synchronize_srcu(
u);
>   srcu_read_unlock(, i);   mutex_unlock(_lock);
>   mutex_unlock(_lock);
> 
> Here CPU1's read-side critical sections are for mysrcu, which is
> independent of CPU2's grace period for yoursrcu.

Right, but that's already covered by having separate a lockdep_map for
each SRCU subsystem (mysrcu, yoursrcu).

> So you could flag any lockdep cycle that contained a reader and a
> synchronous grace period for the same flavor of RCU, where for SRCU
> the identity of the srcu_struct structure is part of the flavor.

Right. Basically, I think SRCU should be like a reader/writer lock
(perhaps fixed to work right). The only difference seems to be the
scenario you outlined above (first of the two)?

Actually, given the scenario above, for lockdep purposes the
reader/writer lock is actually the same as a recursive lock, I guess?

You outlined a scenario in which the reader gets blocked due to a
writer (CPU3 doing a write_lock()) so the reader can still participate
in a deadlock cycle since it can - without any other locks being held
by CPU3 that participate - cause a deadlock between CPU1 and CPU2 here.
For lockdep then, even seeing the CPU1 and CPU2 scenarios should be
sufficient to flag a deadlock (*).

This part then isn't true for SRCU, because there forward progress will
still be made. So for SRCU, the "reader" side really needs to be
connected with a "writer" side to form a deadlock cycle, unlike for a
reader/writer lock.

johannes

(*) technically only after checking that write_lock() is ever used, but
... seems reasonable enough to assume that it will be used, since why
would anyone ever use a reader/writer lock if there are only readers?
That's a no-op.

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Paul E. McKenney

On Fri, Mar 24, 2017 at 10:24:46AM +0100, Johannes Berg wrote:
> Hi,
> 
> On Fri, 2017-03-24 at 09:56 +0100, Johannes Berg wrote:
> > On Thu, 2017-03-23 at 16:29 +0100, Johannes Berg wrote:
> > > Isn't it possible for the following to happen?
> > > 
> > > CPU1  CPU2
> > > 
> > > mutex_lock(); // acquires mutex
> > >   full_proxy_xyz();
> > >   srcu_read_lock(_srcu);
> > >   real_fops->xyz();
> > >   mutex_lock(); // waiting for mutex
> > > debugfs_remove(F);
> > > synchronize_srcu(_srcu);
> 
> > So I'm pretty sure that this can happen. I'm not convinced that it's
> > happening here, but still.
> 
> I'm a bit confused, in that SRCU, of course, doesn't wait until all the
> readers are done - that'd be a regular reader/writer lock or something.

Agreed, synchronize_srcu()  does not have to wait for new readers
(as a reader/writer lock would), but it -does- have have to wait for
pre-existing readers, like the one shown in your example above.

> However, it does (have to) wait until all the currently active read-
> side sections have terminated, which still leads to a deadlock in the
> example above, I think?

Yes.  CPU2 has a pre-existing reader that CPU1's synchronize_srcu()
must wait for.  But CPU2's reader cannot end until CPU1 releases
its lock, which it cannot do until after CPU2's reader ends.  Thus,
as you say, deadlock.

The rule is that if you are within any kind of RCU read-side critical
section, you cannot directly or indirectly wait for a grace period from
that same RCU flavor.

> In his 2006 LWN article Paul wrote:
> 
> The designer of a given subsystem is responsible for: (1) ensuring
> that SRCU read-side sleeping is bounded and (2) limiting the amount
> of memory waiting for synchronize_srcu(). [1]
> 
> In the case of debugfs files acquiring locks, (1) can't really be
> guaranteed, especially if those locks can be held while doing
> synchronize_srcu() [via debugfs_remove], so I still think the lockdep
> annotation needs to be changed to at least have some annotation at
> synchronize_srcu() time so we can detect this.

That would be very nice!

There are some challenges, though.  This is OK:

CPU1CPU2
i = srcu_read_lock();mutex_lock(_lock);
mutex_lock(_lock);   i = srcu_read_lock();
srcu_read_unlock(, i);   mutex_unlock(_lock);
mutex_unlock(_lock); srcu_read_unlock(, i);

CPU3
synchronize_srcu();

This could be a deadlock for reader-writer locking, but not for SRCU.

This is also OK:

CPU1CPU2
i = srcu_read_lock();mutex_lock(_lock);
mutex_lock(_lock);   synchronize_srcu();
srcu_read_unlock(, i);   mutex_unlock(_lock);
mutex_unlock(_lock);

Here CPU1's read-side critical sections are for mysrcu, which is
independent of CPU2's grace period for yoursrcu.

So you could flag any lockdep cycle that contained a reader and a
synchronous grace period for the same flavor of RCU, where for SRCU the
identity of the srcu_struct structure is part of the flavor.

> Now, I still suspect there's some other bug here in the case that I'm
> seeing, because I don't actually see the "mutex_lock(); // waiting"
> piece in the traces. I'll need to run this with some tracing on Monday
> when the test guys are back from the weekend.
> 
> I'm also not sure how I can possibly fix this in debugfs in mac80211
> and friends, but that's perhaps a different story. Clearly, this
> debugfs patch is a good thing - the code will likely have had use-
> after-free problems in this situation without it. But flagging the
> potential deadlocks would make it a lot easier to find them.

No argument here!

Thanx, Paul

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Paul E. McKenney

On Fri, Mar 24, 2017 at 10:24:46AM +0100, Johannes Berg wrote:
> Hi,
> 
> On Fri, 2017-03-24 at 09:56 +0100, Johannes Berg wrote:
> > On Thu, 2017-03-23 at 16:29 +0100, Johannes Berg wrote:
> > > Isn't it possible for the following to happen?
> > > 
> > > CPU1  CPU2
> > > 
> > > mutex_lock(); // acquires mutex
> > >   full_proxy_xyz();
> > >   srcu_read_lock(_srcu);
> > >   real_fops->xyz();
> > >   mutex_lock(); // waiting for mutex
> > > debugfs_remove(F);
> > > synchronize_srcu(_srcu);
> 
> > So I'm pretty sure that this can happen. I'm not convinced that it's
> > happening here, but still.
> 
> I'm a bit confused, in that SRCU, of course, doesn't wait until all the
> readers are done - that'd be a regular reader/writer lock or something.

Agreed, synchronize_srcu()  does not have to wait for new readers
(as a reader/writer lock would), but it -does- have have to wait for
pre-existing readers, like the one shown in your example above.

> However, it does (have to) wait until all the currently active read-
> side sections have terminated, which still leads to a deadlock in the
> example above, I think?

Yes.  CPU2 has a pre-existing reader that CPU1's synchronize_srcu()
must wait for.  But CPU2's reader cannot end until CPU1 releases
its lock, which it cannot do until after CPU2's reader ends.  Thus,
as you say, deadlock.

The rule is that if you are within any kind of RCU read-side critical
section, you cannot directly or indirectly wait for a grace period from
that same RCU flavor.

> In his 2006 LWN article Paul wrote:
> 
> The designer of a given subsystem is responsible for: (1) ensuring
> that SRCU read-side sleeping is bounded and (2) limiting the amount
> of memory waiting for synchronize_srcu(). [1]
> 
> In the case of debugfs files acquiring locks, (1) can't really be
> guaranteed, especially if those locks can be held while doing
> synchronize_srcu() [via debugfs_remove], so I still think the lockdep
> annotation needs to be changed to at least have some annotation at
> synchronize_srcu() time so we can detect this.

That would be very nice!

There are some challenges, though.  This is OK:

CPU1CPU2
i = srcu_read_lock();mutex_lock(_lock);
mutex_lock(_lock);   i = srcu_read_lock();
srcu_read_unlock(, i);   mutex_unlock(_lock);
mutex_unlock(_lock); srcu_read_unlock(, i);

CPU3
synchronize_srcu();

This could be a deadlock for reader-writer locking, but not for SRCU.

This is also OK:

CPU1CPU2
i = srcu_read_lock();mutex_lock(_lock);
mutex_lock(_lock);   synchronize_srcu();
srcu_read_unlock(, i);   mutex_unlock(_lock);
mutex_unlock(_lock);

Here CPU1's read-side critical sections are for mysrcu, which is
independent of CPU2's grace period for yoursrcu.

So you could flag any lockdep cycle that contained a reader and a
synchronous grace period for the same flavor of RCU, where for SRCU the
identity of the srcu_struct structure is part of the flavor.

> Now, I still suspect there's some other bug here in the case that I'm
> seeing, because I don't actually see the "mutex_lock(); // waiting"
> piece in the traces. I'll need to run this with some tracing on Monday
> when the test guys are back from the weekend.
> 
> I'm also not sure how I can possibly fix this in debugfs in mac80211
> and friends, but that's perhaps a different story. Clearly, this
> debugfs patch is a good thing - the code will likely have had use-
> after-free problems in this situation without it. But flagging the
> potential deadlocks would make it a lot easier to find them.

No argument here!

Thanx, Paul

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Johannes Berg

Hi,

On Fri, 2017-03-24 at 09:56 +0100, Johannes Berg wrote:
> On Thu, 2017-03-23 at 16:29 +0100, Johannes Berg wrote:
> > Isn't it possible for the following to happen?
> > 
> > CPU1CPU2
> > 
> > mutex_lock(); // acquires mutex
> > full_proxy_xyz();
> > srcu_read_lock(_srcu);
> > real_fops->xyz();
> > mutex_lock(); // waiting for mutex
> > debugfs_remove(F);
> > synchronize_srcu(_srcu);

> So I'm pretty sure that this can happen. I'm not convinced that it's
> happening here, but still.

I'm a bit confused, in that SRCU, of course, doesn't wait until all the
readers are done - that'd be a regular reader/writer lock or something.

However, it does (have to) wait until all the currently active read-
side sections have terminated, which still leads to a deadlock in the
example above, I think?

In his 2006 LWN article Paul wrote:

The designer of a given subsystem is responsible for: (1) ensuring
that SRCU read-side sleeping is bounded and (2) limiting the amount
of memory waiting for synchronize_srcu(). [1]

In the case of debugfs files acquiring locks, (1) can't really be
guaranteed, especially if those locks can be held while doing
synchronize_srcu() [via debugfs_remove], so I still think the lockdep
annotation needs to be changed to at least have some annotation at
synchronize_srcu() time so we can detect this.

Now, I still suspect there's some other bug here in the case that I'm
seeing, because I don't actually see the "mutex_lock(); // waiting"
piece in the traces. I'll need to run this with some tracing on Monday
when the test guys are back from the weekend.

I'm also not sure how I can possibly fix this in debugfs in mac80211
and friends, but that's perhaps a different story. Clearly, this
debugfs patch is a good thing - the code will likely have had use-
after-free problems in this situation without it. But flagging the
potential deadlocks would make it a lot easier to find them.

johannes

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Johannes Berg

Hi,

On Fri, 2017-03-24 at 09:56 +0100, Johannes Berg wrote:
> On Thu, 2017-03-23 at 16:29 +0100, Johannes Berg wrote:
> > Isn't it possible for the following to happen?
> > 
> > CPU1CPU2
> > 
> > mutex_lock(); // acquires mutex
> > full_proxy_xyz();
> > srcu_read_lock(_srcu);
> > real_fops->xyz();
> > mutex_lock(); // waiting for mutex
> > debugfs_remove(F);
> > synchronize_srcu(_srcu);

> So I'm pretty sure that this can happen. I'm not convinced that it's
> happening here, but still.

I'm a bit confused, in that SRCU, of course, doesn't wait until all the
readers are done - that'd be a regular reader/writer lock or something.

However, it does (have to) wait until all the currently active read-
side sections have terminated, which still leads to a deadlock in the
example above, I think?

In his 2006 LWN article Paul wrote:

The designer of a given subsystem is responsible for: (1) ensuring
that SRCU read-side sleeping is bounded and (2) limiting the amount
of memory waiting for synchronize_srcu(). [1]

In the case of debugfs files acquiring locks, (1) can't really be
guaranteed, especially if those locks can be held while doing
synchronize_srcu() [via debugfs_remove], so I still think the lockdep
annotation needs to be changed to at least have some annotation at
synchronize_srcu() time so we can detect this.

Now, I still suspect there's some other bug here in the case that I'm
seeing, because I don't actually see the "mutex_lock(); // waiting"
piece in the traces. I'll need to run this with some tracing on Monday
when the test guys are back from the weekend.

I'm also not sure how I can possibly fix this in debugfs in mac80211
and friends, but that's perhaps a different story. Clearly, this
debugfs patch is a good thing - the code will likely have had use-
after-free problems in this situation without it. But flagging the
potential deadlocks would make it a lot easier to find them.

johannes

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Johannes Berg

On Thu, 2017-03-23 at 16:29 +0100, Johannes Berg wrote:
> Isn't it possible for the following to happen?
> 
> CPU1  CPU2
> 
> mutex_lock();
>   full_proxy_xyz();
>   srcu_read_lock(_srcu);
>   real_fops->xyz();
>   mutex_lock();
> debugfs_remove(F);
> synchronize_srcu(_srcu);

So I'm pretty sure that this can happen. I'm not convinced that it's
happening here, but still.

I tried to make lockdep flag it, but the only way I could get it to
flag it was to do this:

--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -235,7 +235,7 @@ static inline int srcu_read_lock(struct srcu_struct *sp) 
__acquires(sp)
preempt_disable();
retval = __srcu_read_lock(sp);
preempt_enable();
-   rcu_lock_acquire(&(sp)->dep_map);
+   lock_map_acquire(&(sp)->dep_map);
return retval;
 }

@@ -249,7 +249,7 @@ static inline int srcu_read_lock(struct srcu_struct *sp) 
__acquires(sp)
 static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
__releases(sp)
 {
-   rcu_lock_release(&(sp)->dep_map);
+   lock_map_release(&(sp)->dep_map);
__srcu_read_unlock(sp, idx);
 }

diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
index ef3bcfb15b39..0f9e542ca3f2 100644
--- a/kernel/rcu/srcu.c
+++ b/kernel/rcu/srcu.c
@@ -395,6 +395,9 @@ static void __synchronize_srcu(struct srcu_struct *sp, int 
trycount)
 lock_is_held(_sched_lock_map),
 "Illegal synchronize_srcu() in same-type SRCU (or in 
RCU) read-side critical section");

+   lock_map_acquire(>dep_map);
+   lock_map_release(>dep_map);
+
might_sleep();
init_completion();

The lock_map_acquire() in srcu_read_lock() is really not desired
though, since it will make recursion get flagged as bad. If I change
that to lock_map_acquire_read() though, the problem doesn't get flagged
for some reason. I thought it should.

Regardless though, I don't see a way to solve this problem for debugfs.
We have a ton of debugfs files in net/mac80211/debugfs.c that need to
acquire e.g. the RTNL (or other locks), and I'm not sure we can easily
avoid removing the debugfs files under the RTNL, since we get all our
configuration callbacks with the RTNL already held...

Need to think about that, but perhaps there's some other solution?

johannes

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-24 Thread Johannes Berg

On Thu, 2017-03-23 at 16:29 +0100, Johannes Berg wrote:
> Isn't it possible for the following to happen?
> 
> CPU1  CPU2
> 
> mutex_lock();
>   full_proxy_xyz();
>   srcu_read_lock(_srcu);
>   real_fops->xyz();
>   mutex_lock();
> debugfs_remove(F);
> synchronize_srcu(_srcu);

So I'm pretty sure that this can happen. I'm not convinced that it's
happening here, but still.

I tried to make lockdep flag it, but the only way I could get it to
flag it was to do this:

--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -235,7 +235,7 @@ static inline int srcu_read_lock(struct srcu_struct *sp) 
__acquires(sp)
preempt_disable();
retval = __srcu_read_lock(sp);
preempt_enable();
-   rcu_lock_acquire(&(sp)->dep_map);
+   lock_map_acquire(&(sp)->dep_map);
return retval;
 }

@@ -249,7 +249,7 @@ static inline int srcu_read_lock(struct srcu_struct *sp) 
__acquires(sp)
 static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
__releases(sp)
 {
-   rcu_lock_release(&(sp)->dep_map);
+   lock_map_release(&(sp)->dep_map);
__srcu_read_unlock(sp, idx);
 }

diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
index ef3bcfb15b39..0f9e542ca3f2 100644
--- a/kernel/rcu/srcu.c
+++ b/kernel/rcu/srcu.c
@@ -395,6 +395,9 @@ static void __synchronize_srcu(struct srcu_struct *sp, int 
trycount)
 lock_is_held(_sched_lock_map),
 "Illegal synchronize_srcu() in same-type SRCU (or in 
RCU) read-side critical section");

+   lock_map_acquire(>dep_map);
+   lock_map_release(>dep_map);
+
might_sleep();
init_completion();

The lock_map_acquire() in srcu_read_lock() is really not desired
though, since it will make recursion get flagged as bad. If I change
that to lock_map_acquire_read() though, the problem doesn't get flagged
for some reason. I thought it should.

Regardless though, I don't see a way to solve this problem for debugfs.
We have a ton of debugfs files in net/mac80211/debugfs.c that need to
acquire e.g. the RTNL (or other locks), and I'm not sure we can easily
avoid removing the debugfs files under the RTNL, since we get all our
configuration callbacks with the RTNL already held...

Need to think about that, but perhaps there's some other solution?

johannes

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Johannes Berg

Hi,

> Not yet. How reproducible is this?

Apparently quite. I haven't tried myself - it happens during some
automated test that I need to analyse further.

> > We're observing that with our (backported, but very recent) driver
> > against 4.9 (and 4.10, I think),
> 
> Do I understand it correctly that this driver has been backported
> from 4.11-rcX to 4.9/10

Yes.

>  and that there isn't any issue with 4.11-rcX?

No, I can't say this, we haven't run that test.

> > but there are no backports of any debugfs things so the backport
> > itself doesn't seem like a likely problem.
> 
> Right, there haven't been any SRCU related changes to debugfs after
> 4.8.

Right.

> > sysrq-w shows a lot of tasks blocked on various locks (e.g. RTNL),
> > but
> > the ultimate problem is the wireless stack getting blocked on
> > debugfs_remove_recursive(), in __synchronize_srcu(), in
> > wait_for_completion() (while holding lots of locks, hence the other
> > tasks getting stuck).
> 
> Could you share a complete backtrace? For example, is the
> debugfs_remove_recursive() called from any debugfs file's fops and
> thus, possibly from within a SRCU read side critical section?

No, it's called from netlink:

[  884.634857] wpa_supplicant  D0  1769   1005 0x
[  884.634874]   8ca50633d140 8ca507b219c0 
8ca5455d4cc0
[  884.634898]  8ca54f599d98 97df431c36a0 878dadf3 
8ca50001
[  884.634927]  81ed67337c8469e4 8ca54f599d98 932a07b219c0 
8ca507b219c0
[  884.634952] Call Trace:
[  884.634969]  [] ? __schedule+0x303/0xb00
[  884.634985]  [] schedule+0x3d/0x90
[  884.635002]  [] schedule_timeout+0x2fc/0x600
[  884.635021]  [] ? mark_held_locks+0x66/0x90
[  884.635041]  [] ? _raw_spin_unlock_irq+0x2c/0x40
[  884.635059]  [] wait_for_completion+0xdc/0x110
[  884.635073]  [] ? wake_up_q+0x80/0x80
[  884.635091]  [] __synchronize_srcu+0x11e/0x1c0
[  884.635109]  [] ? 
trace_raw_output_rcu_utilization+0x60/0x60
[  884.635131]  [] synchronize_srcu+0x32/0x40
[  884.635145]  [] debugfs_remove_recursive+0x17d/0x190
[  884.635239]  [] ieee80211_debugfs_key_remove+0x1e/0x30 
[mac80211]
[  884.635333]  [] __ieee80211_key_destroy+0x1b3/0x480 
[mac80211]
[  884.635440]  [] ieee80211_free_sta_keys+0x117/0x170 
[mac80211]
[  884.635524]  [] __sta_info_destroy_part2+0x4c/0x200 
[mac80211]
[  884.635597]  [] __sta_info_flush+0x10d/0x1a0 [mac80211]
[  884.635706]  [] ieee80211_set_disassoc+0xcb/0x530 
[mac80211]
[  884.635802]  [] ieee80211_mgd_deauth+0x2e6/0x7b0 [mac80211]
[  884.635901]  [] ieee80211_deauth+0x18/0x20 [mac80211]
[  884.636024]  [] cfg80211_mlme_deauth+0x14f/0x3b0 [cfg80211]
[  884.636110]  [] nl80211_deauthenticate+0xe5/0x130 
[cfg80211]
[  884.636133]  [] genl_family_rcv_msg+0x1bc/0x370
[  884.636151]  [] ? genl_family_rcv_msg+0x370/0x370
[  884.636262]  [] genl_rcv_msg+0x80/0xc0
[  884.636275]  [] netlink_rcv_skb+0xa7/0xc0
[  884.636289]  [] genl_rcv+0x28/0x40
[  884.636303]  [] netlink_unicast+0x15b/0x210
[  884.636318]  [] netlink_sendmsg+0x31a/0x3a0
[  884.636335]  [] sock_sendmsg+0x38/0x50
[  884.636354]  [] ___sys_sendmsg+0x26c/0x280
[  884.636378]  [] ? ring_buffer_unlock_commit+0x32/0x290
[  884.636393]  [] ? __buffer_unlock_commit+0x1e/0x40
[  884.636407]  [] ? tracing_mark_write+0x162/0x2b0
[  884.636423]  [] ? __lock_is_held+0x49/0x70
[  884.636440]  [] __sys_sendmsg+0x45/0x80
[  884.636459]  [] SyS_sendmsg+0x12/0x20
[  884.636477]  [] entry_SYSCALL_64_fastpath+0x23/0xc6


johannes

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Johannes Berg

Hi,

> Not yet. How reproducible is this?

Apparently quite. I haven't tried myself - it happens during some
automated test that I need to analyse further.

> > We're observing that with our (backported, but very recent) driver
> > against 4.9 (and 4.10, I think),
> 
> Do I understand it correctly that this driver has been backported
> from 4.11-rcX to 4.9/10

Yes.

>  and that there isn't any issue with 4.11-rcX?

No, I can't say this, we haven't run that test.

> > but there are no backports of any debugfs things so the backport
> > itself doesn't seem like a likely problem.
> 
> Right, there haven't been any SRCU related changes to debugfs after
> 4.8.

Right.

> > sysrq-w shows a lot of tasks blocked on various locks (e.g. RTNL),
> > but
> > the ultimate problem is the wireless stack getting blocked on
> > debugfs_remove_recursive(), in __synchronize_srcu(), in
> > wait_for_completion() (while holding lots of locks, hence the other
> > tasks getting stuck).
> 
> Could you share a complete backtrace? For example, is the
> debugfs_remove_recursive() called from any debugfs file's fops and
> thus, possibly from within a SRCU read side critical section?

No, it's called from netlink:

[  884.634857] wpa_supplicant  D0  1769   1005 0x
[  884.634874]   8ca50633d140 8ca507b219c0 
8ca5455d4cc0
[  884.634898]  8ca54f599d98 97df431c36a0 878dadf3 
8ca50001
[  884.634927]  81ed67337c8469e4 8ca54f599d98 932a07b219c0 
8ca507b219c0
[  884.634952] Call Trace:
[  884.634969]  [] ? __schedule+0x303/0xb00
[  884.634985]  [] schedule+0x3d/0x90
[  884.635002]  [] schedule_timeout+0x2fc/0x600
[  884.635021]  [] ? mark_held_locks+0x66/0x90
[  884.635041]  [] ? _raw_spin_unlock_irq+0x2c/0x40
[  884.635059]  [] wait_for_completion+0xdc/0x110
[  884.635073]  [] ? wake_up_q+0x80/0x80
[  884.635091]  [] __synchronize_srcu+0x11e/0x1c0
[  884.635109]  [] ? 
trace_raw_output_rcu_utilization+0x60/0x60
[  884.635131]  [] synchronize_srcu+0x32/0x40
[  884.635145]  [] debugfs_remove_recursive+0x17d/0x190
[  884.635239]  [] ieee80211_debugfs_key_remove+0x1e/0x30 
[mac80211]
[  884.635333]  [] __ieee80211_key_destroy+0x1b3/0x480 
[mac80211]
[  884.635440]  [] ieee80211_free_sta_keys+0x117/0x170 
[mac80211]
[  884.635524]  [] __sta_info_destroy_part2+0x4c/0x200 
[mac80211]
[  884.635597]  [] __sta_info_flush+0x10d/0x1a0 [mac80211]
[  884.635706]  [] ieee80211_set_disassoc+0xcb/0x530 
[mac80211]
[  884.635802]  [] ieee80211_mgd_deauth+0x2e6/0x7b0 [mac80211]
[  884.635901]  [] ieee80211_deauth+0x18/0x20 [mac80211]
[  884.636024]  [] cfg80211_mlme_deauth+0x14f/0x3b0 [cfg80211]
[  884.636110]  [] nl80211_deauthenticate+0xe5/0x130 
[cfg80211]
[  884.636133]  [] genl_family_rcv_msg+0x1bc/0x370
[  884.636151]  [] ? genl_family_rcv_msg+0x370/0x370
[  884.636262]  [] genl_rcv_msg+0x80/0xc0
[  884.636275]  [] netlink_rcv_skb+0xa7/0xc0
[  884.636289]  [] genl_rcv+0x28/0x40
[  884.636303]  [] netlink_unicast+0x15b/0x210
[  884.636318]  [] netlink_sendmsg+0x31a/0x3a0
[  884.636335]  [] sock_sendmsg+0x38/0x50
[  884.636354]  [] ___sys_sendmsg+0x26c/0x280
[  884.636378]  [] ? ring_buffer_unlock_commit+0x32/0x290
[  884.636393]  [] ? __buffer_unlock_commit+0x1e/0x40
[  884.636407]  [] ? tracing_mark_write+0x162/0x2b0
[  884.636423]  [] ? __lock_is_held+0x49/0x70
[  884.636440]  [] __sys_sendmsg+0x45/0x80
[  884.636459]  [] SyS_sendmsg+0x12/0x20
[  884.636477]  [] entry_SYSCALL_64_fastpath+0x23/0xc6


johannes

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Johannes Berg

On Thu, 2017-03-23 at 08:37 -0700, Paul E. McKenney wrote:

> I have not seen this, but my usual question for __synchronize_srcu()
> is if some other task is blocked holding srcu_read_lock() for that
> same srcu_struct.
> 

Not as far as I can see - but that was the scenario I was outlining in
my second email, I guess.

I'll need to reproduce it and a get a fuller view of the system, I only
have the "echo w > sysrq-trigger" output right now.

Thanks,
johannes

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Johannes Berg

On Thu, 2017-03-23 at 08:37 -0700, Paul E. McKenney wrote:

> I have not seen this, but my usual question for __synchronize_srcu()
> is if some other task is blocked holding srcu_read_lock() for that
> same srcu_struct.
> 

Not as far as I can see - but that was the scenario I was outlining in
my second email, I guess.

I'll need to reproduce it and a get a fuller view of the system, I only
have the "echo w > sysrq-trigger" output right now.

Thanks,
johannes

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Paul E. McKenney

On Thu, Mar 23, 2017 at 03:54:46PM +0100, Johannes Berg wrote:
> Hi,
> 
> Before I go hunting - has anyone seen a deadlock in synchronize_srcu()
> in debugfs_remove() before? We're observing that with our (backported,
> but very recent) driver against 4.9 (and 4.10, I think), but there are
> no backports of any debugfs things so the backport itself doesn't seem
> like a likely problem.
> 
> sysrq-w shows a lot of tasks blocked on various locks (e.g. RTNL), but
> the ultimate problem is the wireless stack getting blocked on
> debugfs_remove_recursive(), in __synchronize_srcu(), in
> wait_for_completion() (while holding lots of locks, hence the other
> tasks getting stuck).

I have not seen this, but my usual question for __synchronize_srcu()
is if some other task is blocked holding srcu_read_lock() for that
same srcu_struct.

Thanx, Paul

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Paul E. McKenney

On Thu, Mar 23, 2017 at 03:54:46PM +0100, Johannes Berg wrote:
> Hi,
> 
> Before I go hunting - has anyone seen a deadlock in synchronize_srcu()
> in debugfs_remove() before? We're observing that with our (backported,
> but very recent) driver against 4.9 (and 4.10, I think), but there are
> no backports of any debugfs things so the backport itself doesn't seem
> like a likely problem.
> 
> sysrq-w shows a lot of tasks blocked on various locks (e.g. RTNL), but
> the ultimate problem is the wireless stack getting blocked on
> debugfs_remove_recursive(), in __synchronize_srcu(), in
> wait_for_completion() (while holding lots of locks, hence the other
> tasks getting stuck).

I have not seen this, but my usual question for __synchronize_srcu()
is if some other task is blocked holding srcu_read_lock() for that
same srcu_struct.

Thanx, Paul

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Nicolai Stange

Hi Johannes,

On Thu, Mar 23 2017, Johannes Berg wrote:

> Before I go hunting - has anyone seen a deadlock in synchronize_srcu()
> in debugfs_remove() before?

Not yet. How reproducible is this?

> We're observing that with our (backported, but very recent) driver
> against 4.9 (and 4.10, I think),

Do I understand it correctly that this driver has been backported from
4.11-rcX to 4.9/10 and that there isn't any issue with 4.11-rcX?

> but there are no backports of any debugfs things so the backport
> itself doesn't seem like a likely problem.

Right, there haven't been any SRCU related changes to debugfs after
4.8.

> sysrq-w shows a lot of tasks blocked on various locks (e.g. RTNL), but
> the ultimate problem is the wireless stack getting blocked on
> debugfs_remove_recursive(), in __synchronize_srcu(), in
> wait_for_completion() (while holding lots of locks, hence the other
> tasks getting stuck).

Could you share a complete backtrace? For example, is the
debugfs_remove_recursive() called from any debugfs file's fops and thus,
possibly from within a SRCU read side critical section?

Thanks,

Nicolai

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Nicolai Stange

Hi Johannes,

On Thu, Mar 23 2017, Johannes Berg wrote:

> Before I go hunting - has anyone seen a deadlock in synchronize_srcu()
> in debugfs_remove() before?

Not yet. How reproducible is this?

> We're observing that with our (backported, but very recent) driver
> against 4.9 (and 4.10, I think),

Do I understand it correctly that this driver has been backported from
4.11-rcX to 4.9/10 and that there isn't any issue with 4.11-rcX?

> but there are no backports of any debugfs things so the backport
> itself doesn't seem like a likely problem.

Right, there haven't been any SRCU related changes to debugfs after
4.8.

> sysrq-w shows a lot of tasks blocked on various locks (e.g. RTNL), but
> the ultimate problem is the wireless stack getting blocked on
> debugfs_remove_recursive(), in __synchronize_srcu(), in
> wait_for_completion() (while holding lots of locks, hence the other
> tasks getting stuck).

Could you share a complete backtrace? For example, is the
debugfs_remove_recursive() called from any debugfs file's fops and thus,
possibly from within a SRCU read side critical section?

Thanks,

Nicolai

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Johannes Berg

On Thu, 2017-03-23 at 15:54 +0100, Johannes Berg wrote:

> Before I go hunting - has anyone seen a deadlock in
> synchronize_srcu() in debugfs_remove() before?

Isn't it possible for the following to happen?

CPU1CPU2

mutex_lock();
full_proxy_xyz();
srcu_read_lock(_srcu);
real_fops->xyz();
mutex_lock();
debugfs_remove(F);
synchronize_srcu(_srcu);

-> deadlock?

I'm not convinced that this is the scenario I'm looking at, since then
it seems I should see the mutex_lock() on CPU 2 with a backtrace
pointing to a full_proxy and the debugfs operation I recognize, but
lots of debugfs files acquire locks and it seems likely that it's not
always removed without holding those locks?

Am I missing something? I'll see if I can add lockdep annotations.

johannes

Re: deadlock in synchronize_srcu() in debugfs?

2017-03-23 Thread Johannes Berg

On Thu, 2017-03-23 at 15:54 +0100, Johannes Berg wrote:

> Before I go hunting - has anyone seen a deadlock in
> synchronize_srcu() in debugfs_remove() before?

Isn't it possible for the following to happen?

CPU1CPU2

mutex_lock();
full_proxy_xyz();
srcu_read_lock(_srcu);
real_fops->xyz();
mutex_lock();
debugfs_remove(F);
synchronize_srcu(_srcu);

-> deadlock?

I'm not convinced that this is the scenario I'm looking at, since then
it seems I should see the mutex_lock() on CPU 2 with a backtrace
pointing to a full_proxy and the debugfs operation I recognize, but
lots of debugfs files acquire locks and it seems likely that it's not
always removed without holding those locks?

Am I missing something? I'll see if I can add lockdep annotations.

johannes

38 matches

Mail list logo