On 09/08/05 14:36, Alan Stern wrote:
> On Thu, 8 Sep 2005, Luben Tuikov wrote:
>
>
>>>No. There are only two reset mechanisms available for USB Mass Storage.
>>
>>I see.
>>
>>
>>>One is a class-specific reset command (which usb-storage issues when asked
>>>to do a device reset),
>>
>>So as far as I can see it sends a reset on the wire to the device?
>
>
> Sort of. It's not a low-level hardware reset, it's rather high-level.
> If the device's firmware is busy doing something else, or is hung, or
> doesn't implement resets correctly, then the reset won't work.
So this is perfect for LU Reset.
>>>and the other is a USB port reset (which
>>>usb-storage issues when asked to do a bus reset).
>>
>>As far as I see this resets the port on the host side and then
>>rediscovers the devices?
>
>
> That's right. (Except there's only one device to rediscover.) This is a
> much lower-level action than the other sort, and is much more likely to
> succeed.
So this is perfect for I_T Nexus Reset.
>>>Well, usb-storage doesn't need to "remove" a failed device. The SCSI core
>>>does a perfectly good job just by setting it off-line. And since there
>>>aren't other targets sharing the bus, that's good enough.
>>
>>Well what happens when I just unplug the device?
>
>
> For a brief time (maybe 1/4 second) all commands will return DID_ERROR
> and resets will return FAILED. Then usbcore will realize that the device
Perfectly ok.
> has been unplugged and will call the usb-storage disconnect method, which
> in turn will call scsi_remove_host.
This is fine. I haven't looked closely at the code, but note that
scsi_remove_host() should absolutely be called last.
>>What code? What reinvention?
>
> scsi_unjam_host plus the things that it calls. Why add a private version
> of all that to usb-storage when it already exists in scsi_error.c?
Because it is not implemented correctly.
Think about this:
EH is function of scsi_host.
You can define YOUR OWN EH! in usb storage, which you control!
So that when you want to remove the host you know what position
you're at. ;-)
>>You _need_ to implement a timeout and eh hook if you want to do this
>>in any sane way.
>
>
> No I don't. There already is a timeout in the SCSI core
> (scmd->eh_timeout); I don't need to add another one.
Eh, you're missing my point completely.
Hopefully one day you can see what I mean.
>>You also need to implement kref via kobject for the devices which you/USB
>>Storage
>>represent to SCSI Core.
>
> Again, I don't need to. The data used by usb-storage belongs to the
> private portion of the scsi_host, so it's already protected by the host's
> kref.
Well then, if your code is so correct, why are all these problems?
>> You also need to get rid of that horrible patchwork
>>of a solution: dev_semaphore. Maybe you can take a look in the SAS code and
>>see how this is all done?
>
> I assume you're referring to the way dev_semaphore is used in the
> bus_reset routine? It's sort of a preventative measure, because the SCSI
> core and the driver core haven't quite settled down yet.
"Settled down" -- you mean, litte design if ever has been done writing this
thing? Or do you mean: "one is waiting for the other to change in certain
ways to make it work"?
> Here's the idea. We want to protect against a race between the error
> handler doing a bus reset and the user doing rmmod usb-storage. With the
> current code, I believe scsi_remove_host does not wait for devices to go
> out of error recovery. Here is what might happen without dev_semaphore:
>
> Error handler thread Rmmod thread
> -------------------- ------------
> eh calls usb-storage's bus_reset
> -> calls usb_stor_port_reset
> -> calls usb_lock_device_for_reset
> -> calls usb_reset_device
> Driver core calls usb-storage's
> remove method
> storage_disconnect calls
> quiesce_and_remove_host
> -> locks & releases dev_semaphore (*)
> -> calls scsi_remove_host
> (doesn't wait for eh)
> -> returns to driver core
How can all this happen if a kref (module count) is upped by one,
due to SCSI Core using usb-storage?
> -> usb_reset_device does
> the port reset
>
> (This picture is over-simplified because the driver core does some locking
> of its own before calling the remove method. Let's not worry about that.)
>
> Now see what has happened. usb-storage's remove method has returned, so
> it has given up all claims to the device. It shouldn't do anything more
> that will affect the device. And yet in the eh thread it's about to do a
> port reset!
Yes, this is because eh should be in USB Storage.
>
> Since in reality the bus_reset routine acquires dev_semaphore, the rmmod
> thread will wait at (*) until the reset has finished. Alternatively, if
OMG!
> scsi_remove_host waited until the device was out of error recovery, then
> we would be okay without using dev_semaphore.
>
> The complication exists inside usb_lock_device_for_reset. That routine
> ends up calling down_trylock and looping with a timeout, because of the
> potential for a complicated deadlock that I have described before.
OMG!
> There's one more thing worth mentioning. Thanks to the resets carried out
> internally by usb-storage, we may decide it's easiest not to implement the
> bus_reset method at all! That would certainly solve all the problems with
> dev_semaphore. :-)
What would solve your problem is implementing your own eh and timer hook.
More so for transports such as USB.
Luben
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html