On Sun, 29 Feb 2004, David Brownell wrote:
* !!! trouble starts: bad entry reported from dl_reverse_done_list seems poisoned memory in this case (in many cases it's not 5a5a5a)
ohci_hcd 0000:00:0c.1: bad entry 5a5a5a50
That's really odd. That poison byte is used by mm/slab.c, but this "bad entry" came from a TD or ED, and hence from a dmapool. And drivers/base/dmapool.c uses different poison bytes. (Partly to highlight strangeness like this ... )
As I said most of the bad entry addresses aren't 5a5a5a-like, I just selected this one because I considered the slab poison-byte remarkable too. In most cases the look more like being "normal".
Right, I suspect there are actually multiple bugs showing up here. One slab-related, another might be in OHCI, but given those slab problems it's hard to know for sure.
Are you using uniprocessor x86, or something else? "-mregparam" build option?
THEORY: somebody's abusing some general purpose kmem_cache_t, maybe the one dmapool used for a "struct dma_page" holding TDs ("size-32"?). That holds a dma_addr_t which may get handed to the OHCI controller. ....
Well, switching the same setup to ehci using the HS-hub's TT there are no problems even remotely comparable to this one.
Slab problems could show up with EHCI. Other bugs might not show trigger very similar symptoms with EHCI.
--- or things go south like this:
The rude approach. Whatever corruption happened was not blatant enough to make the OHCI silicon die, so things just mysteriously wedge. We hope it didn't use DMA to clobber pagetables or somesuch.
;-)
But AFAICS the box continous to work rock solid - it's just usb became unuseable. To me it looks like the corruption remains confined in pci_pool.
Depends how much memory you have, and what got corrupted. There's no rule saying that corrupting a few 16-byte headers must have immediate negative effects; statistically, it won't.
Is there anything we might learn from the td-hash dump?
Yes, pretty much a confirm that you're seeing a problem I recently found out about this one. I'm tracking it down, it might be a real OHCI issue (not triggered by slab or dma_pool corruption in recent kernels).
By the way: thanks for the great debug information. It makes it a lot easier to sort through a complex problem!
- Dave
* using the device (ifup, start discovery)
irlap_change_speed(), setting speed to 9600
Does this do anything more than wait for a bulk packet to complete, and then send another one?
* only a few seconds later
ohci_hcd 0000:00:0c.1: bad entry afbc0c0 ohci_hcd 0000:00:0c.1: td_hash[0] ohci_hcd 0000:00:0c.1: entry td cafbc000; urb cb23a2e0 index 0; hw next td 0afbc040 ohci_hcd 0000:00:0c.1: info 00140000 CC=0 (CARRY) DI=0 IN R ohci_hcd 0000:00:0c.1: cbp 0afbe020 be 0afbe020 (len 1)
That one would be the hub status IRQ transfer. Your later post, without the hub, didn't show this.
ohci_hcd 0000:00:0c.1: td_hash[4] ohci_hcd 0000:00:0c.1: entry td cafbc100; urb c8298ae4 index 0; hw next td 00000000 ohci_hcd 0000:00:0c.1: info 03140000 CC=0 DATA1 DI=0 IN R ohci_hcd 0000:00:0c.1: cbp 08264000 be 08264fff (len 4096)
Did you recently issue an IN packet on this same endpoint?
ohci_hcd 0000:00:0c.1: bad? td cafbc0c0; urb 00000000 index 0; hw next td 0afbc100
ohci_hcd 0000:00:0c.1: info 5e000000 CC=5 DATA0 DI=0 SETUP ohci_hcd 0000:00:0c.1: cbp 00000000 be 00000000 (len 0)
Those are the two that look suspicious. And they look a lot less cryptic once you know that I've seen almost the same "bad?" TD in some unlink testing I've been doing, with pure bulk traffic, paired with the strange "good" one (hwNextTD also null/bogus, that should only happen with dummy TDs).
Notice how bad->hwNextTD points to the first/only td_hash[4] entry. Everything except its hwNextTd and one "info" byte is zero ... the "SETUP" comes from a bitfield that's zero. And when I dumped just a few more values from the TD, everything exept bad->td_dma was was also zero. As if a just-allocated TD got sent to the HC, and it came back with CC=5/timeout because of course a bulk endpoint isn't going to accept a SETUP packet.
It actually looks as if the "good" and "bad" TDs swapped places in part ... if the "bad" one were in the hashtable, and had the non-zero fields initialized properly, that'd look like a pretty normal IN transfer. But that's almost too simple and obvious...
------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel