On Tue, Jul 01, 2003 at 12:06:14PM -0400, Alan Stern wrote: > > Jun 28 00:32:30 joehill kernel: SCSI error: host 1 id 0 lun 0 return code = 8000002 > > Jun 28 00:32:30 joehill kernel: ^ISense class 7, sense error 0, extended sense 0 > I think those particular error messages arise because your device has > problems handling a command telling it to lock/unlock its door.
Does this violate a standard? Is there an appropriate error code the device should give when it has no door to (un)lock? I can report this to the manufacturer who I hope will fix it if the device isn't behaving properly. > Interesting. What's happening is that after a while your device becomes > comatose. It stops responding to commands. Linux's error recovery takes > 50 seconds, at the end of which your device has been reset and it starts > working again, though not for long. That's why your throughput is so > lousy. I see. That makes a lot of sense. Looking at the log file, though, it looks like the delay is consistently exact 10 seconds? e.g.: Jul 1 17:07:06 joehill kernel: usb-storage: usb_reset_device returns 0 Jul 1 17:07:16 joehill kernel: usb-storage: queuecommand called ... Jul 1 17:07:57 joehill kernel: usb-storage: usb_reset_device returns 0 Jul 1 17:08:07 joehill kernel: usb-storage: queuecommand called ... and so forth. > > Incidentally, if are able to isolate this to a firmware bug, my > > expectation is that Neuros would be open to suggestions. > This does look very much like a firmware bug. It would be especially nice > if it were repeatable easily, without having to send 50 - 100 MB of data > back and forth first. Based on your suggestions, I've noticed: - I can copy very small amounts of data with no problem. If I mount, copy 1M, and unmount, it works perfectly. - At 5-10M it starts to slow down, but it's still within a reasonable time frame--e.g., 5 minutes for 10M. If I mount, copy 10M, unmount, then remount, copy another 10M, etc., it's fairly useable. - Over that amount, it becomes unuseable. Although I've yet to do a precisely scientific survey, it seems to me that it's not slowing down in a linear fashion--e.g., 20M is more than twice as long as 10M--but I'll have to do some more experimentation to see if this is true. > By the way, I noticed that all the problems in your logs occurred with > writes to the device. Have you noticed any similar problems with reads? Reading apparently works perfectly. If I mount the device and read the contents thousands of times, it doesn't slow down at all, and it unmounts without complaint. No errors in the logfile either. > One thing that could help a little would be if you also enable > USB-debugging in addition to usb-storage debugging. That's another kernel > configuration option. Try doing that, then capture a complete system log > for the time when you get that dreadful slowdown; make sure it covers a > period of at least a couple of minutes. With that information, we may > be able to say definitely that this is (or is not) a firmware issue. I rebuilt with CONFIG_USB_DEBUG=y. I'm not seeing any additional information in the logfile, however. I assume that usb_debug messages would *not* report as "kernel: usb-storage:" but as "kernel: usb" or something else... If I remove all the usb-storage lines, there's nothing in syslog relating to the drive or the USB subsystem except this one line: Jul 1 17:04:22 joehill kernel: hub 1-0:0: port 2 enable change, status 110 I looked around, and it doesn't look like I need to do anything more to enable USB debug messages (e.g., a flag in the proc filesystem)--so does this mean USB just has nothing to say? And if there are no more USB debug error messages, does this make it more likely that this is a firmware issue? Finally, is there anyway te reduce the timeout interval for the kernel when it appears the device isn't responding? Also, when I get to a state where it won't let me halt or reboot, do I have any options other than killing the system? I've tried killing the parent task, removing the module, going to runlevel 1, etc., and it still gets stuck--I'd love to at least be able to escape more gracefully. Thanks again-- --Adam
pgp00000.pgp
Description: PGP signature