On Friday 27 April 2007, Mike Nuss wrote: > Sometimes upon removing one of our devices (for which we have a custom USB > driver), OHCI fails > to free all the associated resources with the device. The problem is always > associated > with the "IRQ INTR_SF lossage" message, which I assume is probably a hardware > issue
As far as I can tell, yes that's a hardware bug. Ramifications are unclear. On the other hand, the issue has been seen on more than one chipset, so that may be overly simplistic. It's fortunately a rare error. > (we are using the ZFMicro USB chipset, which has given us other headaches). Yeah, there are already quirk workarounds for that chipset, specifically related to unlinking ... > Anyways, I realized that our disconnect() method was never getting called in > these cases, and it's because there is a lockup in usb_hcd_endpoint_disable. > Our device has two endpoints (plus control) - one for reads, and one for > writes. When we hit this condition, it always hangs while disabling the read > endpoint. I added a few lines of debug code to ohci-hcd.c and hcd.c to try to > figure out what's going on, and traced it to the usb_kill_urb on line 1386 of > hcd.c. > > Apr 27 18:50:43 blademan26 user.info kernel: usb 1-2.2: USB disconnect, > address 18 > Apr 27 18:50:43 blademan26 user.debug kernel: usb 1-2.2: unregistering device > Apr 27 18:50:43 blademan26 user.debug kernel: usb 1-2.2: usb_disable_device > nuking all URBs > Apr 27 18:50:43 blademan26 user.debug kernel: ohci_hcd 0000:00:13.0: shutdown > urb c2aac180 pipe 40411280 ep2in-intr > Apr 27 18:50:47 blademan26 user.warn kernel: ohci_hcd 0000:00:13.0: IRQ > INTR_SF lossage > Apr 27 18:50:47 blademan26 user.err kernel: ohci_hcd 0000:00:13.0: leak ed > c3c5f500 (#82) state 0 (has tds) ISTR this bug dating from back in the 2.4 days. Happens very intermittently, and never (any longer) to me ... after lots of driver fixes (little races) in the 2.5.early days, I stopped being able to reproduce it. > Apr 27 18:50:47 blademan26 user.err kernel: ohci_hcd 0000:00:13.0: free td > c3e47640 > Apr 27 18:50:47 blademan26 user.err kernel: ohci_hcd 0000:00:13.0: freed > Apr 27 18:50:47 blademan26 user.debug kernel: ohci_hcd 0000:00:13.0: urb list > not empty <just after line 1373> > Apr 27 18:50:47 blademan26 user.debug kernel: ohci_hcd 0000:00:13.0: kill > urb: c2aac180 status -108 <just before line 1386> > > The call to usb_kill_urb never returns. What would cause that to happen? The expected interrupt never appears. INTR_SF is supposed to happen every millisecond. When it doesn't appear, something has gone *REALLY* wrong and it's not clear how to recover. As noted by the comment near that "lossage" diagnostic, this is a symptom of some fairly major problems. >From what I remember of those cases, it _may_ be that the only way to recover is to reset the hardware, recycle all the TDs and EDs, then restart. Certainly I tried a lot of the obvious workarounds, with no real success. > It seems like that this point, we know the device is long gone, so there > should > be some way to force the issue. ISTR trying the obvious "pretend INTR_SF happended" but seeing misbehavior (likely oopsing, but that was quite a few years ago by now), which is why the code still leaks that memory. Someone who can reproduce this bug should try to fix it... - Dave ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel