This disconnect() issue is a parallel of the open()/disconnect() issue. In both cases, there's state that must linger after disconnect() returns, and be cleaned up later. ...
....
Leaving that aside for the moment, you're arguing that usb_reset_device() should be allowed to block until after the driver is unbound from the device. Doesn't this violate the principle that once a driver's disconnect() has returned, the driver is not allowed to do _anything_ to a device? How is usb_reset_device() supposed to know that the unbinding happened and hence it should fail? It doesn't even know which interface(s) the driver calling it was bound to!
Hmm? If khubd removes the device, then usb_reset_device() only needs to see it's gone in order to fail. That's the case we've been discussing: usbcore involved. Since it's gone, all the fault processing boils down to: clean up, release reference. The caller of usb_reset_device() would see "-ENODEV".
The cases that could allow re-binding are a bit messier, since the device would be reset and the task doing unbind might not want that. (DFU enumeration would, though, which is why I sort of like the notion of having usbcore's probe logic do the reset, not drivers themselves.) All we could really do there, I think, is expect usbcore not to break; we can't exactly specify any sort of "smart" behavior. Or at least I can't think of one, beyond the DFU case.
usb_reset_device() doesn't take a reference to the driver's module. Hence there can't be any threads (like SCSI EH) still trying to use it.
That would be a driver bug: the EH thread would have taken an extra reference to the device, and certainly should have refcounted it before dropping the lock which allowed disconnect() to start.
(And maybe an extra reference to the driver module, but that sounds more like something SCSI should have done to usb-storage.)
Actually the SCSI core does both. The EH thread doesn't take any special references. To compensate, the core doesn't drop its references until the EH is through. (And BTW I named the wrong routine below; it should be scsi_remove_host(), not scsi_unregister_host().)
OK, good to know, though that would seem to be the root of one problem (and fixable).
... because disconnect() calls
scsi_unregister_host() and that routine won't return until the EH has finished.
A similar observation applies. Although that one might be harder
to resolve, since in this case it's SCSI that's placing curious
synchronization problems on the rest of Linux. It's still not all
that hotplug-friendly, I guess ...
The SCSI core still hasn't fully integrated the hotplug/reference-counting approach, as you say. Mike Anderson has been working on this for a long time, but the host registration part still has a ways to go.
Yeah, the SCSI core has been around longer than USB too. That means the code has a lot more ... "artifacts" ... that are tricky to change.
On the other side, maybe USB hasn't gone all the way either. Calling a
Clearly not! We don't even have suspend/resume working everywhere yet, plus the init/reset/enumeration issues are only now starting to change from how they started back in 2.2/2.4 kernels. Different "artifacts". I think we're further along, but clearly not "all the way" done.
driver's disconnect() is how we revoke the usb_device/interface pointer that was passed to probe(). If the driver has a thread blocked in usbcore somewhere, waiting on a semaphore, how can it tell that thread the usb_device pointer is now stale?
Well, at one level it's no different from _any_ other synchronization problem. If they share a pointer (say in driver_state->usbdev), access to that pointer needs to be locked, and disconnect() can null it out.
If they don't share it, each task must have an independent refcount, and the driver will need to have some way to say "ESTALE!" to other tasks (like setting a driver_state->gone flag).
Periodically I think of two other things. First, that if we submitted urbs TO something that usbcore managed, like an endpoint, it'd be easy enough to have usbcore return -ESTALE. Second, that we're slightly overloading disconnect() ... it unbinds (a) because of unplug, also (b) because of normal unbind, and those have different fault modes.
I don't think either of those two things is fixable in 2.6 kernels, but solving the first one could easily resolve the second.
- Dave
-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel