[linux-usb-devel] Re: USB locking problems

Alan Stern Fri, 07 May 2004 14:47:43 -0700

On Fri, 7 May 2004, David Brownell wrote:

> > First, let's acknowledge that in 2.6 we must require drivers not to 
> > interact with a device after their disconnect() has returned.  Also, let's 
> 
> What's "interact" though?  Not usb_put_dev(), clearly...
> I'd certainly agree that starting new I/O to any of
> the endpoints in the disconnected interface is forbidden,
> or changing its altsettings, etc.


I/O to any endpoints (which includes altsetting changes -- ep0), and
requests to the device's port on the parent hub, i.e.,
suspend/resume/reset.

> But so long as the device is physically present, control traffic
> (ep0in/ep0out) is possible.  And resetting is much the same:  neither
> ep0 traffic nor reset() operations have stronger life-cycle constraints
> than "device must still be connected".  Whether that's a good thing
> for drivers to do is a different question.  There has never been
> enforcement of any policy in those areas.

No, and there can't be because of the way things are structured now.  All
we can do is audit the drivers and try to make sure they will behave.

> > acknowledge that in the absence of significant changes to the SCSI core, 
> > if usb-storage has a device-reset pending then its disconnect() routine 
> > _can't_ return until the reset has finished.  My scheme takes these things 
> > into account.
> 
> See above re the reset(), though for argument I'll accept
> that usb-storage needs to work around SCSI (for now).

The scheme I outlined can be simplified once that workaround is no longer 
needed.

> > A key ingredient is to recognize that suspend/resume/reset don't need to 
> > lock the parent hub.  They're only going to make use of one port, and it 
> 
> You'll notice that "locktree" described itself as locking the port... :)
> 
> The reason it locks the parent hub, very briefly, is to force tasks
> to serialize.  The dynamics change a bit when you're concerned with
> devices that are hubs ... every one of those operations refers to
> a TREE of downstream devices.  Disconnect does too.  So if someone
> is suspending/resuming/resetting/disconnecting an upstream device,
> the downstream device must be prevented from starting until that
> upstream operation completes.

That argument, carried to its logical conclusion, says that we should
delay _any_ transactions to any device below a hub that's being suspended
or disconnected.  In fact things aren't quite that extreme.  Suspending a
hub will suspend the entire subtree below it first, so by the time the hub
itself gets suspended there won't _be_ any transactions going on, i.e.,
nothing to prevent.  The same is true for logical disconnection -- for
physical disconnection it obviously makes no difference what operations we
attempt on devices below the hub.


> > Serializing access to a port should be done without using a semaphore
> > or other simple synchronization primitive, because a process waiting
> > to use a port must be forced to abort with an error if the port's
> > child device is disconnected.  That's necessary to avoid deadlock with
> > disconnect:
> > 
> >     khubd other thread
> >     -----                           ------------
> >     acquire port                    lock device
> >                                     wait for port
> >     process disconnect notification
> >       on the port
> >     mark device as STATE_NOTATTACHED
> >     wake up port waiters
> 
> Where "port waiters" being a new notion, like the notion
> of splitting apart the "port" (downstream end of the link?)
> and the device (upstream end?) as separate entities.
> 
> Why separate entities for both ends of the hub<-->dev link?

They _are_ separate entities.  When the device is disconnected the port 
will remain in existence.  It can still generate connect-status-change 
reports and khubd still has to manage it.

> The "port waiters" notion looks like it it's probably worth
> pursuing, given the SCSI EH/disconnect issue.

Even without that issue.  Without some way to serialize access to a port, 
you might have khubd trying to react to the status changes caused by 
another thread doing a port reset, for example.


> > Using this approach, there's no need for a thread holding a child's lock 
> > to acquire the parent's lock -- no bottom-up locking.  Serializing access 
> > to a port can be done fairly easily:  Each hub can have a 32-bit bitmask 
> > indicating which ports are in use.  There can be a waitqueue of processes 
> > waiting for an in-use port; since there's not liable to be much contention 
> > a single global waitqueue should suffice for all devices.  Access to the 
> > bitmasks and the waitqueue is protected by a spinlock.
> 
> Or maybe atomic bitmasks with test-and-set, and wait_event().

Yes, that thought occurred to me too.  I had forgotten that waitqueue
heads include their own built-in spinlocks.

> > Next, let's consider the 3 paths for binding:
> > 
> >     (A) New device, khubd calls usb_set_configuration(),
> >             interfaces are probed.
> >     (B) Sysfs calls usb_set_configuration(), interfaces are
> >             unbound and then probed.
> >     (C) New driver is registered, driver model core probes all
> >             matching unbound interfaces.
> 
> There's also usbfs access, which should closely resemble (B).

Certainly.

> > Unbinding is similar.  Notice that during (A) the device's hub port has
> > already been acquired, but during (B) and (C) it hasn't.  Also, during (A) 
> > and (B) the device has been locked, but not during (C).
> 
> Modulo my proposal to eliminate that restriction for (A) and (B),
> removing the difference from (C).

I think we really should lock the device during configuration changes.  A
certain amount of mutual exclusion with tasks that want to traverse all
the interfaces in the device (like /proc/bus/usb/devices) is required.  
The bus rwsem might be sufficient for this, but it locks too much -- all
the USB trees.  In principle this mutual exclusion could be accomplished
using something other than ->serialize, but why make things more
complicated than they have to be?  I don't think we lose much by doing
regular device locking here.

> > It turns out we will also need to distinguish cases (A) and (B).  To do
> 
> Why?

I think I mentioned that somewhere lower down.  In (A) the hub port is 
already acquired; in (B) it isn't.  This affects what usb_reset_device() 
must do.

> > that, we introduce new fields into struct usb_device and struct
> > usb_interface (taking a suggestion of Dave's).  The usb_device field can 
> > take on the values
> > 
> >     USB_DEVICE_ATTACHING, USB_DEVICE_ATTACHED, USB_DEVICE_DETACHING
> > 
> > and the usb_interface field can take on the values
> > 
> >     USB_INTERFACE_UNBOUND, USB_INTERFACE_PROBING,
> >     USB_INTERFACE_BOUND, USB_INTEFACE_UNBINDING
> 
> Not unreasonable, though I don't yet see what those would achieve.
> I've wondered about SUSPENDING/RESUMING states too, but they have
> not seemed critical.  That's a lot of states to add; and most of
> them won't be needed.  On principle, I want to avoid adding new
> states ... and their consequent complexities.

I haven't gone through all the details of this.  It's not clear that all
those states are needed, for example, although they do form a logical set.  
My main reason for adding them was to let usb_reset_device() work in as
many contexts as possible.  One thing I am certain of is that the
interface field needed to deal with the usb-storage/SCSI-EH problem.  If
that problem goes away then the field can probably go too.  The device
field... that's a separate issue.  At the moment I forget the context in
which it's needed.

> > Device resets are the hardest part to get right, because of the
> > requirement that they not be allowed to block disconnect().  The key here
> > is to change usb_reset_device's API: along with the usb_device the caller
> > must also pass its usb_interface pointer.  Then a polling loop much like
> > what I discussed before can be used to acquire the device lock; if it
> > fails whenever the interface is not PROBING or BOUND that will prevent any
> > problems.
> 
> Why pass the device at all, if the interface is available?  :)

Because sometimes there _is_ no relevant interface.  For example, a reset 
performed through usbfs would not be related to any particular interface.

> That API change sounds a good thing, regardless of how any
> of the locking questions get resolved.

Luckily, at this point there are very few places that do device resets.  
It's easy to change all of them.


> > In addition, usb_reset_device() must be aware of the context in which it
> > is called.  If the device is already locked (as it would be during
> > probe()) we should avoid trying to lock it again.  If the device's hub's
> > port is already acquired (as it would be in (A)) we should avoid trying to
> > acquire it again.  But now all the information needed to make these 
> > determinations is available.  This means the device resets will work even 
> > during probe(), with no other modifications needed.
> 
> Hmm, wouldn't my own proposal for not holding usbdev->serialize also
> resolve that reset-during-probe issue?  I get the impression you didn't
> quite see what I was driving at with it.

I may not have followed all the details of what you said.  But for the 
reasons given above, I think we should hold ->serialize during config 
changes.  Did you already provide a way of countering my objection?  If 
you did then I missed it, sorry.

Alan Stern




-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to 
deliver higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

[linux-usb-devel] Re: USB locking problems

Reply via email to