Re: [linux-usb-devel] PATCH: (as266) Lock devices during tree traversal

David Brownell Wed, 05 May 2004 08:01:55 -0700

And I would find it acceptable to use hubdev->serialize to protect the links at the inner nodes if only I could see how to also get usb_reset_device() working reliably. And useable from within probe().
I think usb_reset_device(dev) would just need to call locktree(dev),
then fail cleanly on -ENODEV returns.
As for usable from probe() ... I just had the germ of an idea.  How about
splitting out the "logical" topology lock (protecting SET_CONFIGURATION,
while it's probing every interface) from the "physical" topology lock
(hub->serialize)?   A single bit in "struct usb_device" would suffice
to prevent nested calls to the usb_set_configuration() routine, during
probe(), which would cause self-deadlock (and other chaos).
I've been considering possibilities along those lines as well, but they're still far too cloudy to mention here. Instead, here's a description of the picture I'm building up:

Locks will always go top-down. This automatically solves the "disconnecting a subtree and the whole tree simultaneously" problem.

Among others ... :)

When the thread starting from the top reaches the hub above the subtree
(i.e., the hub that reported a port connect change) it will find that hub
locked by the subtree-disconnect thread.  When the hub is unlocked the
entire subtree below the port will be gone, so the whole-tree disconnect
can proceed without difficulty.

Yes, as I'd essentially commented before.

The big question is how to make device resets work. It's pretty clear that usb_reset_device(udev) needs to lock both the parent hub and udev.


I don't quite see that.  During the brief period during enumeration when
(a) both devices exist, and (b) set_configuration hasn't yet succeeded,
it'd be in a retry loop which can re-create the device -- sure.

But during usb_reset_device(), there are basically two outcomes, neither
of which require the hub to be locked:  (i) device resets fine, and is
restored to same configuration and altsettings.  No topology changed.
(ii) reset fault, including "device morphed" and "error" cases, which
should trigger disconnect/cleanup.  (And then later re-enumeration.)

The problem is that the driver's disconnect() and suspend() routines will probably block waiting for usb_reset_device to return, so we can't simply acquire the semaphores in the usual way.


That seems to be your sticking point.  I think in both cases that'd
be an example of a driver bug, but I'll go into that in a different
message.

This approach ought to work for acquiring hub->serialize:
        for (;;) {
                if (udev->state != USB_STATE_CONFIGURED)
                        return -ENODEV;
                if (down_trylock(&hub->serialize))
                        break;
                wait_ms(100);
        }
Yes, this is a polling loop. It's not so bad, because device resets are very rare. Furthermore, when a driver does a device reset it expects to encounter a long time delay.


Delay and/or error.  That loop doesn't look so bad, though I'd not wait
as long as that ... the lock isn't usually going to be held that long,
except while debouncing a new port connection.

This won't deadlock with disconnect caused by unplugging the device (or
unbinding the hub driver!) because the device's state will be changed to
USB_STATE_NOTATTACHED.  It won't deadlock with disconnect caused by
rmmod'ing the driver because that won't lock the hub.  To prevent


But it'd lock an ancestral hub ... same net effect yes, unless the
locking isn't really top-down (from root hub)?

deadlocks with suspend, your patch would have to change the device's state before calling its suspend() routine -- or else the driver's suspend() routine would have to avoid waiting for usb_reset_device to return.


The latter is I think strongly preferable:  drivers just not doing
things that would deadlock.

Re changing state before suspend() ... right now I have OHCI changing
the root hub state to QUIESCING, so that new submissions will fail
cleanly during the period (sometimes almost 10 msec) needed to flush
pending transactions before entering global suspend.  (That state
hasn't really been used before.)  The alternative is to leave lots
of half-finished transactions in a state they can be cleaned up later,
even if the hardware state changes before resume.

(Incidentally, I don't see the need for usb_suspend_device() to call locktree(). Wouldn't it suffice to lock the hub and then check that it is still USB_STATE_CONFIGURED?)


That wouldn't handle concurrent operations as consistently.  Just at
the moment I don't have cycles to spare to think about alternatives.
I've nearly got remote wakeup (of the HC, using PME# on PCI!) working,
and want to finish making that behave.  Then maybe I could look at
some alternatives.

Next we need to lock udev itself. However, if some other thread already has locked udev (for suspend or unbind) then we generally want to fail. So a simple down_trylock() will suffice.

There are exceptions, though. What if udev is locked because a different interface is being bound or unbound? I don't know -- this comes under the heading of resetting a device with more than one interface, so let's ignore it.


I agree with not resetting devices controlled by more than one driver.
(Of which the simple version is "more than one interface".)  But I'm
not sure we should care _why_ something was locked ... at least, not
until we analyse all the reasons locks are needed!!

What if udev is locked because of a tree traversal?  Then we don't want to
fail; we just want to wait until the lock is released.  This is the
special reason I mentioned at the top of this email.  Now you see the
problem with making serialize into an rwsem: We want to acquire a write
lock, we want to wait if the lock is currently held for reading, and we
want to fail if the lock is currently held for writing.  Rwsems won't do
that.  But if tree traversal used a separate semaphore then everything
would be all right -- we could acquire udev->serialize and ignore the
topology semaphore since we aren't going to change any children[]
pointers.


Tree traversal would need to be _one_ mode of locking, that works
concurrently with others.  I'm afraid I don't see how that'd work
if each mode of locking used separate locks.  Each mode working
in isolation, no problem ... but the world is not so simple!!

The last problem is dealing with device resets during probe. The first part of this problem -- locking the hub -- might be easy. Is there any reason why usb_new_device() can't be called with the hub unlocked?


If SET_CONFIGURATION errors didn't enter a retry loop, then the
failure modes should just cause "normal" re-enumeration.  But the
downside is that they'd probably do it forever.  And regardless,
something (a simple per-device locked flag could suffice) would
need to prevent concurrent SET_CONFIGURATION calls.  That was
the "germ of an idea" I mentioned above.

The second part of the problem is trickier.  Do we need to hold
udev->serialize while probing?  Certainly we want to avoid suspend during

See "germ". :)

probe and vice versa.  Apart from that, perhaps it doesn't matter.  Or
perhaps we could adopt something along the lines of your suggestion, so
that usb_reset_device() would know if it was being called from within
probe and could skip acquiring the device lock.  (But what if it was
called by a driver for an interface other than the one being probed?)


Well, the issue would be "called during SET_CONFIGURATION", since
probe happens both then (during enumeration) or later during driver
binding.  A flag "I'm configuring this device" would solve that
issue with multi-driver devices (but not the other one, wherein the
drivers need to coordinate with each other about reset).

Or as a last resort, we can rule out resets during probe -- any driver that wants to do it will have to use a separate kernel thread or work queue. The disadvantage is that probe() will have to return before learning the result of the reset.


That was one of the options, corresponding to "how things work today".
I agree it'd be better not to have to rule that out.

- Dave

Alan Stern

------------------------------------------------------- This SF.Net email is sponsored by: Oracle 10g Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE. http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click _______________________________________________ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Re: [linux-usb-devel] PATCH: (as266) Lock devices during tree traversal

Reply via email to