Alan Stern wrote:


Some significant changes to the error recovery code were just accepted into Greg K-H's kernel tree. They will probably show up in the main distribution in 2.5.71, or you can get them right now by BitKeeper from bk://linuxusb.bkbits.net/usb-2.5. With these changes, the errors you see shouldn't stop things in dead in their tracks (although they might cause a 10-second delay). Of course, that's just in theory -- who knows what will happen when you actually try it!


Of course, persisting in the face of errors is all very nice, but it would be better to eliminate the errors in the first place.

Yes, I noticed those patches ... their effect should be interesting.


I hope you're not saying it's OK for fault recovery code to oops,
since I'll strongly disagree.  **Faults happen** as do true errors
(hardware and software) ... fault recovery is essential.  I know
that stuff is hard to debug.

At this point, it's not clear to me that there's any real error
happening except in the recovery code.  I'm not saying there
isn't a second problem -- only that there's only evidence of that
one problem (in fault recovery) just now.  And having the fault
recovery work right will certainly help turn up proof if there
really is some other problem.


There could also be an issue specific to those large-capacity Maxtor devices, since several people have reported problems with them (at high speed) that I don't recall showing up for other devices. Hard to say just now; it could just be that problems with those have gotten reported while others didn't. Or that they're fast enough to bring new issues to the table.


I see this with kernels 2.4.21-rc2 and rc6 just the same. 2.5.70 is
even worse, it just stalls the access (very early on, not after
several 100MB) without any log messages, and CPU load diverges without
any useful information showing up with "top". It happens only with the
EHCI driver, in full-speed mode I haven't yet been able to produce
this error (maybe due to the relaxed timing).

Well that 2.5.70 failure mode is curious...


It would be interesting to know if the 2.4.21 CPU load problem is related to the EHCI driver or the usb-storage driver.

Likely it's some kind of bad interaction, possibly related to those larger-capacity drives. I don't recall it being a true CPU load -- more of a "too much time in i/o wait states" problem.


Checking the ehci "async" and "registers" files (in sysfs)
could be useful.  The last time I saw a failure anything like
that, the issue was a deadlock inside storage+scsi, since the
EHCI driver had handed all requests back ("async" was empty)
and Alt-SysRq-T showed usb-storageN and scsi-ehN wedged.


That particular failure existed only in 2.5, and it has since been fixed.

But similar failures could still exist, as near as I can tell. Agreed, that original failure mode is fixed.


Turning on usb-storage logging isn't much use, I haven't seen any
timeout/oops with it enabled, probably because it changes the timings.


I'm still planning to add non-verbose error logging to usb-storage. But the are other things ahead of it. It'll get it there eventually.

I might be able to dig up that patch I sent a while ago, which did that as well as laying the groundwork to use the driver model message framework. Key parts of the work are already done, and only need updating to match recent kernels.

- Dave





-------------------------------------------------------
This SF.net email is sponsored by: eBay
Get office equipment for less on eBay!
http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to