Re: [linux-usb-devel] [BUG] 2.6.3-bk9 Badness in ohci_endpoint_disable

David Brownell Mon, 01 Mar 2004 08:52:27 -0800

Martin Diehl wrote:

On Sun, 29 Feb 2004, David Brownell wrote:
* !!! trouble starts: bad entry reported from dl_reverse_done_list
 seems poisoned memory in this case (in many cases it's not 5a5a5a)
ohci_hcd 0000:00:0c.1: bad entry 5a5a5a50
That's really odd.  That poison byte is used by mm/slab.c, but
this "bad entry" came from a TD or ED, and hence from a dmapool.
And drivers/base/dmapool.c uses different poison bytes.  (Partly
to highlight strangeness like this ... )
As I said most of the bad entry addresses aren't 5a5a5a-like, I just selected this one because I considered the slab poison-byte remarkable too. In most cases the look more like being "normal".


Right, I suspect there are actually multiple bugs showing up
here. One slab-related, another might be in OHCI, but given
those slab problems it's hard to know for sure.

Are you using uniprocessor x86, or something else?  "-mregparam"
build option?

THEORY:  somebody's abusing some general purpose kmem_cache_t,
maybe the one dmapool used for a "struct dma_page" holding TDs
("size-32"?).  That holds a dma_addr_t which may get handed
to the OHCI controller.  ....
Well, switching the same setup to ehci using the HS-hub's TT there are no problems even remotely comparable to this one.


Slab problems could show up with EHCI.  Other bugs might
not show trigger very similar symptoms with EHCI.

--- or things go south like this:
The rude approach.  Whatever corruption happened was
not blatant enough to make the OHCI silicon die, so
things just mysteriously wedge.  We hope it didn't use
DMA to clobber pagetables or somesuch.
;-)

But AFAICS the box continous to work rock solid - it's just usb became unuseable. To me it looks like the corruption remains confined in pci_pool.


Depends how much memory you have, and what got corrupted.
There's no rule saying that corrupting a few 16-byte headers
must have immediate negative effects; statistically, it won't.

Is there anything we might learn from the td-hash dump?


Yes, pretty much a confirm that you're seeing a problem I
recently found out about this one.  I'm tracking it down,
it might be a real OHCI issue (not triggered by slab or
dma_pool corruption in recent kernels).

By the way:  thanks for the great debug information.  It
makes it a lot easier to sort through a complex problem!

- Dave

* using the device (ifup, start discovery)

irlap_change_speed(), setting speed to 9600


Does this do anything more than wait for a bulk packet to complete,
and then send another one?

* only a few seconds later

ohci_hcd 0000:00:0c.1: bad entry  afbc0c0
ohci_hcd 0000:00:0c.1: td_hash[0]
ohci_hcd 0000:00:0c.1:  entry td cafbc000; urb cb23a2e0 index 0; hw next td 0afbc040
ohci_hcd 0000:00:0c.1:      info 00140000 CC=0 (CARRY) DI=0 IN R
ohci_hcd 0000:00:0c.1:      cbp 0afbe020 be 0afbe020 (len 1)


That one would be the hub status IRQ transfer.  Your later post,
without the hub, didn't show this.

ohci_hcd 0000:00:0c.1: td_hash[4]
ohci_hcd 0000:00:0c.1:  entry td cafbc100; urb c8298ae4 index 0; hw next td 00000000
ohci_hcd 0000:00:0c.1:      info 03140000 CC=0 DATA1 DI=0 IN R
ohci_hcd 0000:00:0c.1:      cbp 08264000 be 08264fff (len 4096)

Did you recently issue an IN packet on this same endpoint?

ohci_hcd 0000:00:0c.1: bad? td cafbc0c0; urb 00000000 index 0; hw next td 0afbc100 ohci_hcd 0000:00:0c.1: info 5e000000 CC=5 DATA0 DI=0 SETUP ohci_hcd 0000:00:0c.1: cbp 00000000 be 00000000 (len 0)


Those are the two that look suspicious.  And they look a lot less
cryptic once you know that I've seen almost the same "bad?" TD in
some unlink testing I've been doing, with pure bulk traffic,
paired with the strange "good" one (hwNextTD also null/bogus, that
should only happen with dummy TDs).

Notice how bad->hwNextTD points to the first/only td_hash[4] entry.
Everything except its hwNextTd and one "info" byte is zero ... the
"SETUP" comes from a bitfield that's zero.   And when I dumped just
a few more values from the TD, everything exept bad->td_dma was
was also zero.  As if a just-allocated TD got sent to the HC, and
it came back with CC=5/timeout because of course a bulk endpoint
isn't going to accept a SETUP packet.

It actually looks as if the "good" and "bad" TDs swapped places
in part ... if the "bad" one were in the hashtable, and had the
non-zero fields initialized properly, that'd look like a pretty
normal IN transfer.   But that's almost too simple and obvious...

-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Re: [linux-usb-devel] [BUG] 2.6.3-bk9 Badness in ohci_endpoint_disable

Reply via email to