On Tue, Jul 01, 2003 at 12:06:14PM -0400, Alan Stern wrote:
> > Jun 28 00:32:30 joehill kernel: SCSI error: host 1 id 0 lun 0 return code = 8000002
> > Jun 28 00:32:30 joehill kernel: ^ISense class 7, sense error 0, extended sense 0
> I think those particular error messages arise because your device has 
> problems handling a command telling it to lock/unlock its door.  

Does this violate a standard? Is there an appropriate error code the
device should give when it has no door to (un)lock? I can report this to
the manufacturer who I hope will fix it if the device isn't behaving
properly.

> Interesting.  What's happening is that after a while your device becomes
> comatose.  It stops responding to commands.  Linux's error recovery takes
> 50 seconds, at the end of which your device has been reset and it starts
> working again, though not for long.  That's why your throughput is so
> lousy.

I see. That makes a lot of sense. Looking at the log file, though, it
looks like the delay is consistently exact 10 seconds?

e.g.:
Jul  1 17:07:06 joehill kernel: usb-storage: usb_reset_device returns 0
Jul  1 17:07:16 joehill kernel: usb-storage: queuecommand called
...
Jul  1 17:07:57 joehill kernel: usb-storage: usb_reset_device returns 0
Jul  1 17:08:07 joehill kernel: usb-storage: queuecommand called
... and so forth.

> > Incidentally, if are able to isolate this to a firmware bug, my
> > expectation is that Neuros would be open to suggestions.
> This does look very much like a firmware bug.  It would be especially nice 
> if it were repeatable easily, without having to send 50 - 100 MB of data 
> back and forth first.

Based on your suggestions, I've noticed:

- I can copy very small amounts of data with no problem.  If I mount,
  copy 1M, and unmount, it works perfectly.

- At 5-10M it starts to slow down, but it's still within a reasonable
  time frame--e.g., 5 minutes for 10M.  If I mount, copy 10M, unmount,
  then remount, copy another 10M, etc., it's fairly useable.

- Over that amount, it becomes unuseable.  Although I've yet to do a
  precisely scientific survey, it seems to me that it's not slowing down
  in a linear fashion--e.g., 20M is more than twice as long as 10M--but
  I'll have to do some more experimentation to see if this is true.

> By the way, I noticed that all the problems in your logs occurred with 
> writes to the device.  Have you noticed any similar problems with reads?

Reading apparently works perfectly. If I mount the device and read the
contents thousands of times, it doesn't slow down at all, and it unmounts
without complaint. No errors in the logfile either.

> One thing that could help a little would be if you also enable
> USB-debugging in addition to usb-storage debugging.  That's another kernel
> configuration option.  Try doing that, then capture a complete system log
> for the time when you get that dreadful slowdown; make sure it covers a
> period of at least a couple of minutes.  With that information, we may
> be able to say definitely that this is (or is not) a firmware issue.

I rebuilt with CONFIG_USB_DEBUG=y. I'm not seeing any additional
information in the logfile, however. I assume that usb_debug messages
would *not* report as "kernel: usb-storage:" but as "kernel: usb" or
something else... If I remove all the usb-storage lines, there's nothing
in syslog relating to the drive or the USB subsystem except this one
line:

Jul  1 17:04:22 joehill kernel: hub 1-0:0: port 2 enable change, status 110

I looked around, and it doesn't look like I need to do anything more to
enable USB debug messages (e.g., a flag in the proc filesystem)--so does
this mean USB just has nothing to say?  

And if there are no more USB debug error messages, does this make it more
likely that this is a firmware issue?

Finally, is there anyway te reduce the timeout interval for the kernel
when it appears the device isn't responding?  Also, when I get to a state
where it won't let me halt or reboot, do I have any options other than
killing the system? I've tried killing the parent task, removing the
module, going to runlevel 1, etc., and it still gets stuck--I'd love to
at least be able to escape more gracefully.

Thanks again--

--Adam

Attachment: pgp00000.pgp
Description: PGP signature

Reply via email to