And I would find it acceptable to use hubdev->serialize to protect the links at the inner nodes if only I could see how to also get usb_reset_device() working reliably. And useable from within probe().
I think usb_reset_device(dev) would just need to call locktree(dev), then fail cleanly on -ENODEV returns.
As for usable from probe() ... I just had the germ of an idea. How about splitting out the "logical" topology lock (protecting SET_CONFIGURATION, while it's probing every interface) from the "physical" topology lock (hub->serialize)? A single bit in "struct usb_device" would suffice to prevent nested calls to the usb_set_configuration() routine, during probe(), which would cause self-deadlock (and other chaos).
I've been considering possibilities along those lines as well, but they're still far too cloudy to mention here. Instead, here's a description of the picture I'm building up:
Locks will always go top-down. This automatically solves the
"disconnecting a subtree and the whole tree simultaneously" problem.
Among others ... :)
When the thread starting from the top reaches the hub above the subtree (i.e., the hub that reported a port connect change) it will find that hub locked by the subtree-disconnect thread. When the hub is unlocked the entire subtree below the port will be gone, so the whole-tree disconnect can proceed without difficulty.
Yes, as I'd essentially commented before.
The big question is how to make device resets work. It's pretty clear that usb_reset_device(udev) needs to lock both the parent hub and udev.
I don't quite see that. During the brief period during enumeration when (a) both devices exist, and (b) set_configuration hasn't yet succeeded, it'd be in a retry loop which can re-create the device -- sure.
But during usb_reset_device(), there are basically two outcomes, neither of which require the hub to be locked: (i) device resets fine, and is restored to same configuration and altsettings. No topology changed. (ii) reset fault, including "device morphed" and "error" cases, which should trigger disconnect/cleanup. (And then later re-enumeration.)
The problem is that the driver's disconnect() and suspend() routines will probably block waiting for usb_reset_device to return, so we can't simply acquire the semaphores in the usual way.
That seems to be your sticking point. I think in both cases that'd be an example of a driver bug, but I'll go into that in a different message.
This approach ought to work for acquiring hub->serialize:
for (;;) { if (udev->state != USB_STATE_CONFIGURED) return -ENODEV; if (down_trylock(&hub->serialize)) break; wait_ms(100); }
Yes, this is a polling loop. It's not so bad, because device resets are very rare. Furthermore, when a driver does a device reset it expects to encounter a long time delay.
Delay and/or error. That loop doesn't look so bad, though I'd not wait as long as that ... the lock isn't usually going to be held that long, except while debouncing a new port connection.
This won't deadlock with disconnect caused by unplugging the device (or unbinding the hub driver!) because the device's state will be changed to USB_STATE_NOTATTACHED. It won't deadlock with disconnect caused by rmmod'ing the driver because that won't lock the hub. To prevent
But it'd lock an ancestral hub ... same net effect yes, unless the locking isn't really top-down (from root hub)?
deadlocks with suspend, your patch would have to change the device's state
before calling its suspend() routine -- or else the driver's suspend() routine would have to avoid waiting for usb_reset_device to return.
The latter is I think strongly preferable: drivers just not doing things that would deadlock.
Re changing state before suspend() ... right now I have OHCI changing the root hub state to QUIESCING, so that new submissions will fail cleanly during the period (sometimes almost 10 msec) needed to flush pending transactions before entering global suspend. (That state hasn't really been used before.) The alternative is to leave lots of half-finished transactions in a state they can be cleaned up later, even if the hardware state changes before resume.
(Incidentally, I don't see the need for usb_suspend_device() to call locktree(). Wouldn't it suffice to lock the hub and then check that it is still USB_STATE_CONFIGURED?)
That wouldn't handle concurrent operations as consistently. Just at the moment I don't have cycles to spare to think about alternatives. I've nearly got remote wakeup (of the HC, using PME# on PCI!) working, and want to finish making that behave. Then maybe I could look at some alternatives.
Next we need to lock udev itself. However, if some other thread already has locked udev (for suspend or unbind) then we generally want to fail.
So a simple down_trylock() will suffice.
There are exceptions, though. What if udev is locked because a different interface is being bound or unbound? I don't know -- this comes under the heading of resetting a device with more than one interface, so let's ignore it.
I agree with not resetting devices controlled by more than one driver. (Of which the simple version is "more than one interface".) But I'm not sure we should care _why_ something was locked ... at least, not until we analyse all the reasons locks are needed!!
What if udev is locked because of a tree traversal? Then we don't want to fail; we just want to wait until the lock is released. This is the special reason I mentioned at the top of this email. Now you see the problem with making serialize into an rwsem: We want to acquire a write lock, we want to wait if the lock is currently held for reading, and we want to fail if the lock is currently held for writing. Rwsems won't do that. But if tree traversal used a separate semaphore then everything would be all right -- we could acquire udev->serialize and ignore the topology semaphore since we aren't going to change any children[] pointers.
Tree traversal would need to be _one_ mode of locking, that works concurrently with others. I'm afraid I don't see how that'd work if each mode of locking used separate locks. Each mode working in isolation, no problem ... but the world is not so simple!!
The last problem is dealing with device resets during probe. The first part of this problem -- locking the hub -- might be easy. Is there any reason why usb_new_device() can't be called with the hub unlocked?
If SET_CONFIGURATION errors didn't enter a retry loop, then the failure modes should just cause "normal" re-enumeration. But the downside is that they'd probably do it forever. And regardless, something (a simple per-device locked flag could suffice) would need to prevent concurrent SET_CONFIGURATION calls. That was the "germ of an idea" I mentioned above.
The second part of the problem is trickier. Do we need to hold udev->serialize while probing? Certainly we want to avoid suspend during
See "germ". :)
probe and vice versa. Apart from that, perhaps it doesn't matter. Or perhaps we could adopt something along the lines of your suggestion, so that usb_reset_device() would know if it was being called from within probe and could skip acquiring the device lock. (But what if it was called by a driver for an interface other than the one being probed?)
Well, the issue would be "called during SET_CONFIGURATION", since probe happens both then (during enumeration) or later during driver binding. A flag "I'm configuring this device" would solve that issue with multi-driver devices (but not the other one, wherein the drivers need to coordinate with each other about reset).
Or as a last resort, we can rule out resets during probe -- any driver that wants to do it will have to use a separate kernel thread or work queue. The disadvantage is that probe() will have to return before learning the result of the reset.
That was one of the options, corresponding to "how things work today". I agree it'd be better not to have to rule that out.
- Dave
Alan Stern
-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE. http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel