Re: [linux-usb-devel] BUG: Inconsistent lock state

Alan Stern Fri, 28 Jul 2006 22:39:48 -0700

On Wed, 26 Jul 2006, David Brownell wrote:

> On Wednesday 26 July 2006 8:29 am, Alan Stern wrote:
> > On Wed, 26 Jul 2006, David Brownell wrote:
> > 
> > > > The patch uses spin_lock_irqsave() and spin_unlock_irqrestore() along 
> > > > with
> > > > a call to wait_event().  What happens if you end up calling 
> > > > wait_event() 
> > > > with interrupts disabled?  Would it be better simply to spin?
> > > 
> > > You're ignoring that ugly
> > > 
> > >   if (in_interrupt())
> > >           return 0;
> > 
> > You're forgetting that in_interrupt() is different from irqs_disabled().  
> > In fact, either may be true independent of the other.
> 
> Although I don't think any of the current HCDs allow IRQs to be enabled
> while their handlers run ... so in_interrupt() implies irqs_disabled().
> In the more general case your observation is of course correct.


What the HCDs do is irrelevant.  My point was that it's silly to call
spin_lock_irqsave() and spin_unlock_irqrestore() in the same code block
with wait_event().  The in_interrupt() test doesn't help; it won't prevent
wait_event() from being called with interrupts disabled.  And if you did 
add an irqs_disabled() test then there wouldn't be any need for the 
_irq{save|restore}.

Furthermore, the fact that in_interrupt() implies irqs_disabled() while 
inside an HCD's handler is also irrelevant.  We might be inside some other 
handler, or not in any IRQ handler at all.


> > > Arguably the best thing to do on that branch -- control requests -- would
> > > be to make that the equivalent of "if (1)".  Control requests to root
> > > hubs are always synchronous (and often done with IRQs disabled),
> > 
> > Says who?  Can you point to any examples of control requests to root hubs 
> > that are done with interrupts disabled?
> 
> EHCI, OHCI, and UHCI all need to implement those requests with IRQs disabled,
> since they need to access registers protected by spin_lock_irqsave() ...

Misunderstanding...  When you said the control requests to root hubs are
often _done_ with IRQs disabled, I thought you meant the callers often
left interrupts disabled while _submitting_ the requests.  But you really
meant they are _implemented_ with interrupts disabled, which is true.

I agree that the cleanest way of handling unlink for root-hub control 
requests is to do nothing at all.  But this would lead to the problem 
described below.


> > > so the
> > > worst that would happen is that some other CPU would give the urb back
> > > shortly after that routine returns, rather than shortly before it does so.
> > 
> > That's _not_ the worst.  The code used to be that way, and I changed it
> > after it caused a crash.
> 
> I'm not following.  Which way -- equivalent to if(1)?

I forget exactly what the older code did, but I am certain that it did not 
include the sections copied from usb_kill_urb() -- that is, it did not 
wait for the control URB to complete.  That's the part I added.

> > The problem came up when there was a control URB running while the HCD
> > module was unloaded.  hcd_endpoint_disable() didn't wait for the URB to 
> > complete, the driver was unloaded from memory, and boom!
> 
> Hmm, the if(1) should leave the urb in ep->urb_list so hcd_endpoint_disable()
> would go through the loop again, retrying until the urb completed.  Or was
> there something else going on?  (It's late, sigh.)

Yes, the URB remained in ep->urb_list and hcd_endpoint_disable() did see
it when it went through the loop again.  But urb->status had been changed
to -ESHUTDOWN, so hcd_endpoint_disable() simply skipped over it and did
not keep retrying to unlink it.  (It would be a bug for usbcore to call a
dequeue method more than once for the same URB.)

The real problem was this: hcd_endpoint_disable() relies on 
hcd->driver->endpoint_disable() to wait until all URBs for an endpoint 
have completed.  Normally that's fine, but it doesn't work with URBs for 
endpoints on the root hub -- because these URBs never go through the 
hcd->driver->enqueue() pathway to begin with.

So we needed another way to wait for them.  My solution was to wait in 
usb_rh_urb_dequeue().  Maybe there's a better way.

> > Nowadays we can prevent the crash with appropriate refcounting.  But it
> > could still leave us calling hcd->driver->stop() while the control URB was
> > running.  Do you think it would be safe, given how non-standardized those 
> > HCD methods are?
> 
> I'm missing something.  Refcounting wouldn't solve that; the issue is that
> the URB wasn't yet given back, and thus is on ep->urb_list (with a refcount,
> sure, but _because_ it's on the list and thus still busy).  So stop() wouldn't
> be called ... if HCDs don't behave the same, that's clearly trouble.

No, you're confused.  (It must have been _very_ late.)

Note that hcd_endpoint_disable() doesn't wait for ep->urb_list to be 
empty.  It merely waits for hcd->driver->endpoint_disable() to return.

Refcounting would solve the problem of the HCD's module being unloaded 
from memory while its code was still in use.  It would also solve the 
related problem of the HCD's private data structure being freed while 
still in use.

But if hcd_endpoint_disable() fails to wait for root-hub URBs to complete,
then there is a possibility that stop() could be called while an URB was
still in progress.  (Actually, improvements to the hub driver have made
this much less likely.  Still, it's something to be aware of.)

Alan Stern


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Re: [linux-usb-devel] BUG: Inconsistent lock state

Reply via email to