On Mon, 30 Jun 2003, Adam Kessel wrote:

> When I mount or unmount the device, I see this in syslog (and on the
> console): 
> 
> Jun 28 00:32:30 joehill kernel: SCSI error: host 1 id 0 lun 0 return code = 8000002
> Jun 28 00:32:30 joehill kernel: ^ISense class 7, sense error 0, extended sense 0
> 
> As long as I don't read or write to the device, I can mount and unmount
> (in 2.5.73) repeatedly with no problems other than this error message.  

I think those particular error messages arise because your device has 
problems handling a command telling it to lock/unlock its door.  Obviously 
it doesn't have any sort of door to lock or unlock; the problem is that it 
doesn't return an appropriate error code.  In fact, it just says there's 
an error with no additional information at all; that's what the "sense 
error 0, extended sense 0" above means.  Since the SCSI driver doesn't 
know what the meaning of these errors is, it prints a log message.  You 
can just ignore it.

> I just tried copying a bunch of data to the device. After 50M or so, it
> slowed to a crawl. Initially, I thought it was crashed but eventually it
> started moving again, very slowly (probably around 10K-100K / sec).  The
> log from this operation is here: 
> 
> http://bostoncoop.net/adam/temp/log_while_copying

Interesting.  What's happening is that after a while your device becomes
comatose.  It stops responding to commands.  Linux's error recovery takes
50 seconds, at the end of which your device has been reset and it starts
working again, though not for long.  That's why your throughput is so
lousy.

> Here is the log from the point where I tried unsuccessfully to unmount
> the device: 
> 
> http://bostoncoop.net/adam/temp/unmount_log
> 
> Interestingly, while the unmount process was stuck, I ran df, and got: 
> 
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/hda2             18313076  17338432     44388 100% /
> /dev/sda1             18313076  17338432     44388 100% /mnt/neuros
> 
> /dev/sda1, the USB drive, is now returning to same size as the hard drive
> (in reality, it's a 128M drive, which is the capacity it returned before
> I started the unmount).  

I wouldn't trust that; filesystem information while an unmount is in 
progress is liable to be unreliable.

The unmount wasn't really stuck; it was moving forward but _very_ slowly.  
Of course, unmount causes the filesystem to flush its buffers.  All the
cached data waiting to be written to the device has to be sent out before
the unmount can complete.

Your device would crash, after 50 seconds the driver would reset it, and
then the device would accept another MB or so and crash again.  You can
see this happening repeatedly in the unmount_log file; just look for
places where the timestamp jumps forward by more than 1 second.  If you
had had enough patience, eventually everything would have gotten written
out and the unmount would have finished.  It might have taken a few hours, 
though...

> After several minutes where the unmount process was stuck, I finally
> removed the module, which caused lots of Buffer I/O errors, a filesystem
> panic, etc.. There is a call trace log as well.  Here is the log from
> the point of removing the module until the time it was finally done: 
> 
> http://bostoncoop.net/adam/temp/remove_module_log 

That looks about like I'd expect, given what you did.  Of course, the 
filesystem shouldn't panic, but that's a separate issue.

> Incidentally, if are able to isolate this to a firmware bug, my
> expectation is that Neuros would be open to suggestions.

This does look very much like a firmware bug.  It would be especially nice 
if it were repeatable easily, without having to send 50 - 100 MB of data 
back and forth first.

By the way, I noticed that all the problems in your logs occurred with 
writes to the device.  Have you noticed any similar problems with reads?

> > > Also, frequently the drive appears to be full when it's not.  Although
> > > it's a 128M memory card, it usually maxes out around 60-70M.  
> > What do you mean, appeared to be full?  What did 'df' show?
> 
> df showed 0M free... I can't replicate it now, so I don't remember what
> it thought the capacity was at the time. I suspect this is related to
> filesystem corruption on the drive rather than anything else... Space is
> occupied by files that were copied to the device but don't actually show
> up.

Okay, let's not worry about that for the time being.

> Hopefully the logs above will provide more insight into the situation.
> Is there debugging output I should send other than what usb-storage is
> putting in syslog?

In general, _everything_ that shows up in the log during the time 
intervals in question might be useful.  Don't filter just the usb-storage 
stuff (even though it will probably constitute the majority).

One thing that could help a little would be if you also enable
USB-debugging in addition to usb-storage debugging.  That's another kernel
configuration option.  Try doing that, then capture a complete system log
for the time when you get that dreadful slowdown; make sure it covers a
period of at least a couple of minutes.  With that information, we may
be able to say definitely that this is (or is not) a firmware issue.

Alan Stern



-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to