Re: system hang on EHCI IN transfer

David Brownell Thu, 27 Dec 2007 09:32:51 -0800

> From: "Pandita, Vikram" <[EMAIL PROTECTED]>
>
> On kernel 2.6.24-rc5(from omap-git) I am testing the EHCI controller on 
> OMAP34xx.
> I have connected the device (netchip2280 + g_zero.ko ) to the EHCI controller.
>
> Device side test:
> On the gadget zero I run the IN test 6: ./testusb -a -t6 -s1024 -c100
>
> OMAP EHCI HOST status:
> On EHCI I can see that, the test does not complete sometimes.
>
> The status of EHCI registers is as follows and the test does not complete. 
> The system is still up but no USB activity is happening.


What do you mean?  That you're watching the traffic on-the-wire,
and nothing at all is happening?  (I'm sure TI has a few sniffers
around, and if your lab is doing much USB stuff it should easily
be able to get its own.  TotalPhase has one at $US 1200 now...)


> # cat /sys/class/usb_host/usb_host1/async
> qh/ffc00100 dev3 hs ep1 42002103 40000000 (00001d00  data0 nak4)

That token looks suspicious.  Notice the low byte of all-zeroes,
including neither an "active" flag nor an error flag ... that
seems like a "should not happen".  Hence something the driver
doesn't know what to do with ... qh_completions() will see that
it's not active, qtd_copy_status() thin

If that's typical, you've got the start of a good handle on an
interesting puzzle.  (And I'd expect that there is indeed no traffic
on the wire, not even stuff that's getting NAKed.)  You could
add some diagnostic code to the QH scanning to handle that case.


>         ffc01840+in len=1024 04000d80 urb c7d9b6a0
>         ffc018a0#in len=1024 04000d80 urb c7d9b620
>         ffc01900#in len=1024 04000d80 urb c7d9b5a0
>         ffc01960#in len=1024 04000d80 urb c7d9b520
>         ffc019c0#in len=1024 04000d80 urb c7d9b4a0
>         ffc01a20#in len=1024 04000d80 urb c7d9b420
>         ffc01a80#in len=1024 04008d80 urb c7d9b3a0
>
>
> I suspect the problem with EHCI on OMAP34xx,
> as the same netchip device setup with Dell PC EHCI works fine 

It could be a bug in the EHCI implementation you're using.  (Can
you say whose?)  But I wouldn't assume that's the most likely
case; after all, EHCI implementations should be mature by now.

One small clue to be aware of:  the fact that it works on "fast"
hardware doesn't mean there's not a race lurking that could show
up on slower hardware.

Until its guts were more or less rewritten for the 2.6 kernel
series, even OHCI had **lots** of little micro-races that would
only show up on embedded ARM cores, not on PCs.  It was a long
slog through that code to find and them.  And while the OMAP3
series may be faster than previous OMAPs, it's not as fast as
most five-year old Dells waiting to be recycled ... so it'd be
a good place to notice races where a relatively-faster EHCI
would cause problems.

I'm not saying there *are* such races, but just that existence
of one wouldn't surprise me.  (Howerver, given some other oddball
reports, I do kind of expect one...)  EHCI is fast enough to make
such races show up more frequently than you might think.  The
fix might be as simple as adding a missing memory barrier, but
such stuff can be tricky to find.  You're fortunate to have what
seems to be a reproducible fault mode!


> Any help appreciated? How to go about debugging EHCI side? 

First make sense of that partial register dump.  Translate that async
schedule into English and tell us what it says about, for example,
the token bits in that QH.  Does the QTD overlay area agree with the
QTD itself?

Now, how could it have gotten that way?  (Remembering the QH is in
a dma-coherent memory region...)  And what was the last traffic to
occur *before* it stopped?  Maybe you can hook up trace hardware
to your OMAP board so it snoops all access (or at least, all CPU
access) to the QH pool, and notice when this trouble case shows up.

- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: system hang on EHCI IN transfer

Reply via email to