On Friday 17 June 2005 5:12 pm, Matthias Urlichs wrote:
> Hi,
> 
> > > The question in this case would seem to be why the donelist processing
> > > wasn't catching the TD at the list head.  You might add a debug check
> > > calling ohci_dump_td() on the TD triggering that skip_ed branch; or
> > > maybe even ohci_dump_ed(verbose) to see the whole queue there.  If
> > > that's called a lot, then just dump it the first three or four times.
> 
> Well, I did that, but of course the problem now manifested itself in a
> different way: ... It gets worse, however: dumping the HCCA shows
> that the frame number is not incremented any more, despite HC_IS_RUNNING()
> being true: it's still == ed->tick -1.
> 
> I am not sure how to recover from that. Apparently the chip has wedged
> itself into a corner..?

Or someone's wedged it.  What drivers did you say were active at
the time?

Also, whose OHCI implementation?  As a rule, when an OHCI chip gets
wedged like that, it first issues an "Unrecoverable Error" (UE) IRQ.
Then the USB stack will notice the problem and shut down more or less
cleanly.

But there are some OHCI implementations (OPTi comes to mind, and SiS)
that won't issue UE ... in /sys/class/usb_host/usbN/registers, there
will be no "UE" listing in the "intrenable" line.  If you're using
one of those it's normally no problem ... unless something goes wrong,
like this.  If you can temporarily switch over to a different OHCI
controller (add-in PCI cards are handy!) that supports UE, you might
find it a bit more congenial for chasing this kind of bug.


> Ideas appreciated.

My suspicion is that someone's writing over memory they don't own.
Strip your kernel down and get rid of extraneous components.

One way to shake such problems loose earlier is to turn on the slab
poisoning options (CONFIG_DEBUG_SLAB).  Doesn't always do it, but
it's the best bet in most cases.  Sometimes it helps to tell your
kernel to use less than the total physical memory on your box, so
things get reused more quickly.


> My brute-force idea would be to sample that frame 
> number every couple of jiffies and, if it hasn't changed, call
> ohci_restart() and hope for the best.  :-/

I'd avoid such things; if usbcore isn't involved in shutting
down and restarting the HCDs, it's going to get deeply confused
and start throwing tantrums.

- Dave



-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to