Alan Stern wrote:
On Wed, 2 Apr 2003, David Brownell wrote:
...
Looking at the system state, there were no requests queued to the EHCI driver; it wasn't even doing async schedule processing any more. And the next level up was usb-storage, which seemed to be waiting for requests too ... meanwhile, the "dd" was blocked.
Alt-SysRq-T showed the "dd" blocking in the read; the relevant bits are appended. Disassembly shows that blk_run_queues+0x146 is a CLI instruction right after q->unplug_fn(q).
I'm hunting for suggestions as to what might be wrong, as well as ideally a fix ... :) This system will be wedged like this for another hour or two until I reboot it, in case there's other sleuthing to e suggested.
I can't make too much out of the stack traces you included, but I'm puzzled by the usb-storage section. If usb-storage is really just sitting around waiting for a request, then its stack should be pretty short, with almost no function calls. The main routine of the kernel thread is usb_stor_control_thread(), which calls down_interruptible() from its top level while waiting for a request to come. Is it reasonable that all the stuff in your stack dump is just garbage left from the call to down_interruptible()?
I think so. The "stack dump" code is pretty stupid, it just prints every addresss that looks like it's near a known symbol, so there are lots of false positives ... there's no intelligence to it; unlike a debugger, it's not looking at stack frames. (And of course the kernel plays games with those too, though in this case the kernel had frame pointers for it to use.)
The way I read those traces is that it was still waiting for someone to up() that semaphore by submitting a request.
If nothing better comes up, here's a suggestion. Compile usb-storage with debugging turned on and repeat your test using a serial console (and regular kernel logging turned off!). Of course this will generate lots and lots of logging output but only the last part will be relevant. And we can hope that timing changes caused by the serial output delays will not affect whatever it was you saw.
I generally find the usb-storage messages to be unusable; there's so much "this worked right, that too, hey so did this" going on (surely at least a dozen per request!) that any messages about things that went wrong get drowned in the noise. And those delays have *always* negatively affected timing-related bugs: slowdowns of two or three orders of magnitude. It'd be different if it were only reporting errors (or even just faults) -- those hardly ever happen, so printing out messages then wouldn't break anything.
So I don't think that suggestion can help. It was running for an hour, and getting the same amount of I/O done with the storage debugging messages would take a couple days logging to disk (turns 20+ Mbyte/sec request rates to kBytes/sec); and a lot longer logging to a serial console.
Anyway, there were no error messages from usb-storage, suggesting that nothing was going wrong there except when dealing with the higher level code. Although it's hard to know that for sure, since the success and failure messages are controlled by the same compile flag.
- Dave
Alan Stern
-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb: Dedicated Hosting for just $79/mo with 500 GB of bandwidth! No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
