Make sure you're running a kernel with the kernel hacking "debug memory allocations" option enabled. Then when it pauses, copy the "async" file for that controller and send it to me. In fact, just make a copy of every file in the relevant sysfs directory (it'll be something like /sys/bus/pci/devices/00:09.2).
Done that -- I've attached a complete log and a tarball of the sysfs copies, each a directory with the time in its name so you can match it against the log.
Thanks. Some of this does look like hardware/firmware issues.
* The first problem seemed to be at about 38:37, where the EHCI async schedule looks completely normal ... _unless_ the "nak3" values in the ep1in qh really are _not_ changing. Those values can be recycled very quickly; two samples is rarely enough to show changes, four or five samples is better.
If they're not changing, then EHCI silicon is for some reason not polling for I/O. One way that might happen is if some "short-read" logic misfired -- patch in the works.
* The second problem was at 38:42, where something called some usbcore code which complains "control/bulk timeout".
This is curious; it doesn't seem to be associated with an EHCI device (no such request in the async schedule snapshots). If it's some other host controller, that may suggest some sort of electrical interference happening (why?). If you're using current code, usb-storage isn't making such calls any more...
* The third problem was at about 39:06, where the ep0out control transfer was clearly misbehaving. Its STATUS stage (IN) was just waiting, doing nothing ... but the SETUP stage had worked, as well as any DATA stage (both OUT packets).
Notice a pattern emerging: returning status IN to the host isn't working. But we know the OUT direction worked. This is on ep0out, vs the others being on ep1in.
No, devices are not allowed to make ep0 control requests wait until something else completes.
* Then at about 39:29 the TEST_UNIT_READY command (OUT) was accepted but the response (IN) timed out and was aborted. Only the one snapshot; can't really see if it was NAKing.
I'd hope that such requests wouldn't be accepted by hardware that's not ready to handle them, but unfortunately it seems some developers prefer to leave their OUT FIFOs in "accept packets till full" mode ... maybe that's what's happening.
* Fifth, at 39:34 two things happened: (a) bulk reset requested, and (b) more "control/bulk timeout" messages right away.
Now (a) timed out at 39:54, as with problem #3: there's one snapshot (39:37) showing the same symptom, IN transfer not happening ... same as 40:02 and 40:10, which show NAKing.
That is, it looks like the device really isn't responding. And on the other hand (b) is the same as problem #2. Both of those are rather suspicous.
So, two electrically suspicious issues (#2, #5b), two cases where it seems the device isn't handling control requests legally (#3, #5a), and what seems like a pattern of IN transfers not behaving correctly on _any_ endpoint once the first problem (#1) appears.
That's my interpretation of your data, at any rate.
- Dave
------------------------------------------------------- This SF.net email is sponsored by: Etnus, makers of TotalView, The best thread debugger on the planet. Designed with thread debugging features you've never dreamed of, try TotalView 6 free at www.etnus.com. _______________________________________________ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel