On Friday 17 June 2005 5:12 pm, Matthias Urlichs wrote: > Hi, > > > > The question in this case would seem to be why the donelist processing > > > wasn't catching the TD at the list head. You might add a debug check > > > calling ohci_dump_td() on the TD triggering that skip_ed branch; or > > > maybe even ohci_dump_ed(verbose) to see the whole queue there. If > > > that's called a lot, then just dump it the first three or four times. > > Well, I did that, but of course the problem now manifested itself in a > different way: ... It gets worse, however: dumping the HCCA shows > that the frame number is not incremented any more, despite HC_IS_RUNNING() > being true: it's still == ed->tick -1. > > I am not sure how to recover from that. Apparently the chip has wedged > itself into a corner..?
Or someone's wedged it. What drivers did you say were active at the time? Also, whose OHCI implementation? As a rule, when an OHCI chip gets wedged like that, it first issues an "Unrecoverable Error" (UE) IRQ. Then the USB stack will notice the problem and shut down more or less cleanly. But there are some OHCI implementations (OPTi comes to mind, and SiS) that won't issue UE ... in /sys/class/usb_host/usbN/registers, there will be no "UE" listing in the "intrenable" line. If you're using one of those it's normally no problem ... unless something goes wrong, like this. If you can temporarily switch over to a different OHCI controller (add-in PCI cards are handy!) that supports UE, you might find it a bit more congenial for chasing this kind of bug. > Ideas appreciated. My suspicion is that someone's writing over memory they don't own. Strip your kernel down and get rid of extraneous components. One way to shake such problems loose earlier is to turn on the slab poisoning options (CONFIG_DEBUG_SLAB). Doesn't always do it, but it's the best bet in most cases. Sometimes it helps to tell your kernel to use less than the total physical memory on your box, so things get reused more quickly. > My brute-force idea would be to sample that frame > number every couple of jiffies and, if it hasn't changed, call > ohci_restart() and hope for the best. :-/ I'd avoid such things; if usbcore isn't involved in shutting down and restarting the HCDs, it's going to get deeply confused and start throwing tantrums. - Dave ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel