If that "a5a5a5.." recurs, you might try adding 64 or 128 bytes
to dmapool.c "struct dma_page", forcing those structs into a
different slab cache "one not getting trashed".


it seems it never happens when the device is connected to a SiS USB-1.1 OHCI-HC! So it might even be a silicon issue with the nec-ohci. Or the pci-interface of this guy is somewhat faster exposing some race with the hcd.

If you add a udelay() in your urb completion callback when the urb reports an unlink, does that make the problem vanish? Or printk("S") as I did.

I could imagine that "misc OHCI updates" patch of a few weeks
back speeding up an OHCI implementation because it had less
work to do (no periodic schedule dma), exposing such a race.


recently found out about this one.  I'm tracking it down,
it might be a real OHCI issue (not triggered by slab or
dma_pool corruption in recent kernels).


Hm, just speculation of course, but in td_submit_urb I see quite a number of writel - could it be there might a readl missing somewhere do deal with posted-writes on pci?

I doubt it; the only values the HC should see as _changing_ are ed->hwTailP, in memory. Those PCI writes are just to make sure the HC _eventually_ reads the changed schedule, in case it wasn't already planning to do so. But if you like, have the tail end of that routine do a readl().

It's the finish_unlinks() code that I suspect.  That's a racey
concept, and bugs still turn up from time to time.  So long as
there are two separate completion paths, I'll suspect more bugs
are lurking.


As you can see from the scheme above, there are almost always bulk-reads submitted. Usually they will complete very fast because the device is (mis-) designed to return actual_length==0 instead of NAK when there is no

And there I was thinking that "test 11" logic was completely un-representative of when any sane driver would unlink! :) It hasn't been getting run much lately, it seems. (Can't OSDL start doing that, or something?)


So it could be related to the bulk-unlink?

For me, yes. But yours was an interrupt transfer, yes? Synch vs async interrupt seems to be no issue, which is good.


Notice how bad->hwNextTD points to the first/only td_hash[4] entry.
Everything except its hwNextTd and one "info" byte is zero ... the
"SETUP" comes from a bitfield that's zero...


Ok. When I said it looked like control-transfer this was due to the SETUP packet assumption - (almost) all-zero TD would create this impression too, yes. But why should it have DATA1 in many cases? And look for the strange unaligned hw_next_td field in some of the additional examples below:

The hwNextTD unalignment looks like suspiciously like (ED_C|ED_H), which sometimes the finish_unlinks() code needs to patch. And several of those "bad" entries pointed back to themselves ... yikes!

DATA1 could come from that just being the toggle when the HC wrote
back that word; it's in ED_C.

- Dave





-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to