Re: [linux-usb-devel] PATCH: (as266) Lock devices during tree traversal

Alan Stern Mon, 03 May 2004 19:58:33 -0700

On Mon, 3 May 2004, David Brownell wrote:

> >>Just define usb_reset_device() so the calling task may not hold
> >>the dev->serialize lock, and have its first act be to get the
> >>lock for the hub.  That would imply that probe() paths would
> >>(still) not be able to call it directly though.  Maybe the reset
> >>should always be queued for later.
> > 
> > 
> > That might be workable; I will have to think it through.  Disallowing
> > resets during probe() might be a considerable inconvenience, though.  On
> 
> Except that it's never worked, so ... :)


Yeah, better difficult than impossible.

> > the other hand, for usb-storage Matt Dharm just recently floated the idea
> > of creating a work queue to perform device scanning in a separate process
> > from probe().  Since that's where the resets would take place, it would
> > solve this problem.
> 
> That would be good for usb-storage.  DFU-style devices

They may need some help.  I'm finding the difficulty of doing resets
during probe to be quite annoying.

> > Another question I need to think about: Would the reset have to acquire 
> > dev->serialize at all?  Probably yes, but if it does so blindly it's 
> > liable to deadlock with disconnect.  Presumably a driver's disconnect() 
> > routine won't return until the thread doing the reset has finished, but 
> > that will never happen if the thread is waiting for the serialize lock 
> > owned by the disconnect thread.
> 
> If all the locking obeys the same (correct, and presumably top-down)
> locking protocol, then deadlocks won't happen.  That'd include the
> code doing the reset ... which would see -ENODEV faults starting at
> some point, and then fail cleanly.

No, you've missed a key point.  Let's say disconnect and reset happen at
nearly the same time -- not unlikely, since a user frustrated at the
errors that the reset is trying to correct might very well unplug the
device.  khubd locks the hub, then the device, then calls the driver's
disconnect().  Meanwhile the error handler part of the driver has called
usb_reset_device() which is trying to lock the hub.

Your strictly top-down analysis would say: Okay, khubd will proceed, and
when it's done the device reset will proceed and fail with -ENODEV or
-ESHUTDOWN.  But that's wrong -- khubd can't proceed because the driver
_mustn't_ return from disconnect() until it is finished using the device,
i.e, until usb_reset_device() returns.  (In the case of usb-storage it
_can't_ return; scsi_unregister_host() will block until the error-handler
thread has finished with the device.)

What's needed is a way for usb_reset_device() to try to acquire the hub's
serialize semaphore in such a way that it can be interrupted and fail if
and when the device is marked DISCONNECTED.  Normal semaphores won't
permit something like that.  (I don't think sending a signal and using 
down_interruptible() is a good approach.)


> > P.S.: Here's a question for you to consider.  If the first thing
> > usb_disconnect() does is to go through all the devices in the subtree
> > rooted at the hub being disconnected and mark them all as
> > USB_STATE_DISCONNECTED, and if it wants to do this as quickly as possible
> > (hence not waiting to acquire any semaphores), how can it be sure the
> > topology will remain stable?  One possible solution, not elegant, is to
> > require that any process deleting a children[] pointer must hold the
> > global device-state spinlock as well as the global topology semaphore.
> 
> Last time we talked about this particular thing, I thought we agreed
> a spinlock protecting just usb_device->state would be correct.  Some
> routine along these lines:
> 
>      // internal to usbcore
>      void change_device_state(struct usb_device *dev,
>                               enum usb_device_state state)
>      {
>           static spinlock_t   lock = SPIN_LOCK_UNLOCKED;
>           unsigned long       flags;
> 
>           spin_lock_irqsave(&lock, flags);
>           if (dev->state != USB_STATE_UNATTACHED) {
>                 dev->state = state;
>                 /* hey, why not recurse right here? */
>           }
>           spin_unlock_irqrestore(&lock, flags);
>       }
> 
> I'm not quite sure what your concern is with children[].

Same as before.  If we recurse right there, we end up calling

        change_device_state(dev->children[i], USB_STATE_DISCONNECTED);

How do we know that between the time we make that call and the time the
inner invocation looks at (dev->children[i])->state, the child device
structure hasn't been deallocated?  Remember, we aren't allowed to acquire 
a topology semaphore.

Alan Stern




-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE. 
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Re: [linux-usb-devel] PATCH: (as266) Lock devices during tree traversal

Reply via email to