On Mon, Mar 18, 2002 at 01:10:49AM -0800, David Brownell wrote:
> I distinctly recall that last time this came up I described the problem
> as somebody screwing up the device refcounts. I think it even came
> up with that same driver. That's the BUG() OHCI is reporting.
>
> Basically, it's illegal for anyone except "khubd" to get rid of devices
> in normal operations. "rmmod usb-ohci" is another thread which can
> remove devices (not a normal operation), and "apmd" will too in some
> cases. It's _never_ OK to do so in an interrupt context ... which is one
> reason why you got a BUG() when trying to do that illegal thing.
>
> Which kernel version were you using -- recent? Whose OHCI
> implementation? (I've never quite trusted OPTI.) Not that I have
> any reason to suspect OHCI in this case ... EVERY (**!!**) time
> I've seen these oopses, the problem has been in other kernel
> code that's screwed up the device refcounting
I'm using kernel 2.4.18, patched for XFS. I've upgraded at every
opportunity, on the off-chance the bug magically goes away, and I don't
have any uptime to preserve. ;-)
Realising that this magic disappearance is not the case, I hooked up a
serial console so that I can watch it crash, rather than just cursing
and pressing reboot. If there's anything I can get you to provide more
info, kdb output of some sort, installing lkcd, let me know.
> Looking at your first crash:
>
> > usbdevfs: USBDEVFS_BULK failed dev 2 ep 0x7 len 1696 ret -110
> > usbdevfs: USBDEVFS_BULK failed dev 2 ep 0x7 len 106 ret -110
>
> I could believe bugs in "usbfs", particularly matched with other
> failure modes, though for straight bulk I/O I wouldn't start suspecting
> that at first. Interesting that the second crash started in the same way.
>
> > hub.c: already running port 2 disabled by hub (EMI?), re-enabling...
> > usb.c: USB disconnect on device 2
>
> And it's curious that you were getting this failure mode (both times).
> Do you know why your ADSL modem fails then disconnects?
I'm afraid I don't. I don't know much about how the drivers work, except
that a modem_run process initialises the adapter, uploads firmware,
waits about until the device gets synchronised with the ADSL line, and
then forks into the background. I'm uncertain what modem_run does then,
whether it dies or stays around doing something helpful, but when the
connection drops in pppd (it runs a helper process, pppoa3, to feed the
data between the adapter and pppd) the modem_run process is invariably
dead.
When the machine crashes, or the modem connection drops (which sometimes
happens without killing the machine, but I have to kill off pppd,
modem_run, unmount usbdevfs, and remove usbcore and usb-ohci before I
can up the connection again) there is one flashing red LED on the
adapter, the USB (vs ADSL, the right hand one) indicator, which during
startup of the adapter means you just plugged it in, or during use,
indicates USB congestion. Neither of which are relevant, which leads me
to discount their two-page troubleshooting info. Of course, I'm not
using one of their supported OSes anyway. ;-)
Of note is that the USB adapter in my computer is an add-on card on the
PCI bus. I don't know if it's relevant or not, but the USB ADSL adapter
is said to have a dubiously high power utilisation. Seeing as the
crashes happen during times of high activity, this might be relevant. Or
it might be a complete red herring.
> > usbdevfs: process 253 (pppoa3) did not claim interface 1 before use
>
> Well, there's one bug in the user mode driver right there. Leads
> me to believe others may exist, though it's not supposed to be
> possible to crash things that way. (And while that's a minor bug,
> where there's one bug others are often lurking... :)
There is a vague possibility that modem_run and the ppp connection die,
and then pppd in it's semi-persistent mode tries to start another pppoa3
which tries to access the device, without it having been initialised and
kept open by modem_run. If this is so, it would explain the 'just turned
on' LED indicators on the adapter. Again, I may be totally wrong. Syslog
from the time of the crashes leaves a lot to be desired. Such as the log
entries. =|
That said, I'm probably wrong because that PID is associated with the
successful connections in my logs:
Mar 18 00:11:23 pi pppoa3[253]: ppp(d) --> pppoa3 --> modem stream
ready
Mar 18 00:11:23 pi pppoa3[254]: modem --> pppoa3 --> ppp(d) stream
ready
(it's actually very frequently pid 253, because all this machine does is
boots up, I log in, and run the same script to start the connection)
> Interesting that in your second crash, another process ("modem_run")
> was trying to access a different interface on the same device. Tight
> coordination between two processes isn't as easy as doing it between
> two threads, so I always suspect code using processes that way...
A quick question or two on #speedtouch on irc.openprojects.net reveals:
<Robot101> the other question, what does modem_run do after it's
initialised the adapter and gone into the background?
<francois> well it can say that the modem isn't in the right state. Like
usb stack not loaded correctly by the OS
<francois> it does nothing exept reading the interrupt endpoint
<francois> with modem_run we keep reading an endpoint that nobody else
read
<francois> but you can kill it if you want
<francois> in fact, we know that the modem want us to read it when there
is something on it
<francois> but after the sync, we don't get anything
<francois> except when you loose the sync (like unpluging the line)
> Hmm, I just thought of this ... perhaps this is a design bug in usbfs,
> in that it shouldn't do implicit claiming. Disconnect processing will
> get rid of interface claims, but if usermode doesn't know the device
> is going away (and usbfs doesn't remember it's shutting down),
> interfaces could get automagically re-claimed ... AFTER the kernel
> thinks the driver has already shut down! I'm not sure that's something
> usermode drivers could even defend against.
>
> > usbdevfs: USBDEVFS_BULK failed dev 2 ep 0x7 len 106 ret -110
> > usb-ohci.c: bus 00:0d.0 devnum 2 deletion in interrupt
> > kernel BUG at usb-ohci.c:898!
>
> This pair is odd. Perhaps there are some device disconnect races
> in the usbfs code ... since device 2 is getting disconnected (or had
> it finished?) nobody should have been submitting any more URBs,
> it shouldn't have been possible.
>
> I can very easily imagine buggy disconnect() code breaking device
> refcounting guarantees. In fact I think I remember finding and fixing
> more than a few of that kind of bug ... :) For example, I suspect if the
> disconnect() routine returned with URBs still pending, you'd see this
> kind of crash (and likely other kinds later, if you missed this one).
> OR if, as I just imagined above, if usbfs let processes reconnect
> and issue new requests after khubd invoked disconnect()...
I must admit that I didn't quite follow all that. Nevertheless,
hopefully you can work out what's going on, because I don't have a
snowball's chance in hell of doing so. =)
> >From the second crash log:
>
> > Received interrupts, len = 0
> > Received interrupts, len = 0
> > Received interrupts, len = 0
>
> Where is that "Received interrupts, len = 0" message coming from?
A quick grep shows it appears to come from the modem_run process of the
driver... hrm. Source at speedtouch.sf.net. ;-)
> - Dave
Thanks for your help,
Rob
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-users