On Tue, 1 Jul 2003, Adam Kessel wrote:

> On Tue, Jul 01, 2003 at 12:06:14PM -0400, Alan Stern wrote:
> > I think those particular error messages arise because your device has 
> > problems handling a command telling it to lock/unlock its door.  
> 
> Does this violate a standard? Is there an appropriate error code the
> device should give when it has no door to (un)lock? I can report this to
> the manufacturer who I hope will fix it if the device isn't behaving
> properly.

Yes, the device ought to report Illegal Request.

> > Interesting.  What's happening is that after a while your device becomes
> > comatose.  It stops responding to commands.  Linux's error recovery takes
> > 50 seconds, at the end of which your device has been reset and it starts
> > working again, though not for long.  That's why your throughput is so
> > lousy.
> 
> I see. That makes a lot of sense. Looking at the log file, though, it
> looks like the delay is consistently exact 10 seconds?
> 
> e.g.:
> Jul  1 17:07:06 joehill kernel: usb-storage: usb_reset_device returns 0
> Jul  1 17:07:16 joehill kernel: usb-storage: queuecommand called
> ...
> Jul  1 17:07:57 joehill kernel: usb-storage: usb_reset_device returns 0
> Jul  1 17:08:07 joehill kernel: usb-storage: queuecommand called
> ... and so forth.

You're only seeing part of it.  From your log_while_copying, starting from 
when one of the writes failed:

Jun 30 15:32:48 joehill kernel: usb-storage: usb_stor_bulk_transfer_buf: xfer 13 bytes
Jun 30 15:33:18 joehill kernel: usb-storage: command_abort called

That's a 30 second-delay waiting for the device to respond to the write 
command.

Jun 30 15:33:18 joehill kernel: usb-storage: Command TEST_UNIT_READY (6 bytes)
Jun 30 15:33:18 joehill kernel: usb-storage:  00 00 00 00 00 00
Jun 30 15:33:18 joehill kernel: usb-storage: Bulk command S 0x43425355 T 0x1cf Trg 0 
LUN 0 L 0 F 0 CL 6
Jun 30 15:33:18 joehill kernel: usb-storage: usb_stor_bulk_transfer_buf: xfer 31 bytes
Jun 30 15:33:18 joehill kernel: usb-storage: Status code 0; transferred 31/31
Jun 30 15:33:18 joehill kernel: usb-storage: -- transfer complete
Jun 30 15:33:18 joehill kernel: usb-storage: Bulk command transfer result=0
Jun 30 15:33:18 joehill kernel: usb-storage: Attempting to get CSW...
Jun 30 15:33:18 joehill kernel: usb-storage: usb_stor_bulk_transfer_buf: xfer 13 bytes
Jun 30 15:33:28 joehill kernel: usb-storage: command_abort called

That's a 10-second delay waiting to see if the device is all right.

Jun 30 15:33:28 joehill kernel: usb-storage: bus_reset called
Jun 30 15:33:28 joehill kernel: usb-storage: usb_reset_device returns 0
Jun 30 15:33:38 joehill kernel: usb-storage: queuecommand called

That's a 10-second delay to give the reset a little time to settle down.

> Based on your suggestions, I've noticed:
> 
> - I can copy very small amounts of data with no problem.  If I mount,
>   copy 1M, and unmount, it works perfectly.
> 
> - At 5-10M it starts to slow down, but it's still within a reasonable
>   time frame--e.g., 5 minutes for 10M.  If I mount, copy 10M, unmount,
>   then remount, copy another 10M, etc., it's fairly useable.

Don't think that "fairly useable" is okay.  Your device should not be 
slowing down at all.

> - Over that amount, it becomes unuseable.  Although I've yet to do a
>   precisely scientific survey, it seems to me that it's not slowing down
>   in a linear fashion--e.g., 20M is more than twice as long as 10M--but
>   I'll have to do some more experimentation to see if this is true.

Don't bother.

> > By the way, I noticed that all the problems in your logs occurred with 
> > writes to the device.  Have you noticed any similar problems with reads?
> 
> Reading apparently works perfectly. If I mount the device and read the
> contents thousands of times, it doesn't slow down at all, and it unmounts
> without complaint. No errors in the logfile either.

Well, that's half the battle won without having to do anything! :-)

> > One thing that could help a little would be if you also enable
> > USB-debugging in addition to usb-storage debugging.  That's another kernel
> > configuration option.  Try doing that, then capture a complete system log
> > for the time when you get that dreadful slowdown; make sure it covers a
> > period of at least a couple of minutes.  With that information, we may
> > be able to say definitely that this is (or is not) a firmware issue.
> 
> I rebuilt with CONFIG_USB_DEBUG=y. I'm not seeing any additional
> information in the logfile, however. I assume that usb_debug messages
> would *not* report as "kernel: usb-storage:" but as "kernel: usb" or
> something else... If I remove all the usb-storage lines, there's nothing
> in syslog relating to the drive or the USB subsystem except this one
> line:
> 
> Jul  1 17:04:22 joehill kernel: hub 1-0:0: port 2 enable change, status 110
> 
> I looked around, and it doesn't look like I need to do anything more to
> enable USB debug messages (e.g., a flag in the proc filesystem)--so does
> this mean USB just has nothing to say?  

No, you don't have to do anything else.  And there should be more than 
that, although maybe it only shows up during the system-startup part of 
your log.  But the most important question is what happens during those 
slowdown intervals.

> And if there are no more USB debug error messages, does this make it more
> likely that this is a firmware issue?

Yes it does.

In addition, there are a number of instances in your logs of things 
looking like this:

Jun 30 15:33:28 joehill kernel: usb-storage: usb_stor_control_msg: rq=ff rqtype=21 
value=0000 index=00 len=0
Jun 30 15:33:28 joehill kernel: usb-storage: Soft reset failed: -32

The first line is your computer sending a bulk-only reset command to the
drive, and the second line is the drive replying with a protocol stall,
which means it thinks the reset is an illegal command.  Clearly that's not
right -- it's another bug for you to report to the manufacturer.

BTW, did you check with Mike Bursell?  He said that he had applied a
firmware update that was related to his problems.  Maybe something similar
is happening with you.

> Finally, is there anyway te reduce the timeout interval for the kernel
> when it appears the device isn't responding?  Also, when I get to a state
> where it won't let me halt or reboot, do I have any options other than
> killing the system? I've tried killing the parent task, removing the
> module, going to runlevel 1, etc., and it still gets stuck--I'd love to
> at least be able to escape more gracefully.

Trying to muck around with the kernel to compensate for your device's 
shortcomings is _not_ a good idea.

When you do get stuck, your only realistic options are to remove the 
module or to unplug your device.  Probably either one would have the same 
drastic effect, though -- a filesystem panic isn't pretty.  I'm afraid 
there's not much that can be done about that at the moment.  There are 
still large parts of Linux which don't cope gracefully with device failure 
or hot-unplugging.

Here's one thing you could try.  I think you said earlier that your device 
has a FAT file system.  Instead of mounting it, you could set up the 
'mtools' package to use the device (see the documentation for 
/etc/mtools.conf).  You wouldn't be able to read and write files directly, 
but you could use mcopy to transfer files between the device and your hard 
disk, which might be almost as good.  And if it failed, maybe it wouldn't 
fail so violently.

Alan Stern




-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to