Re: [linux-usb-devel] usbtest #10 with gadgetfs hangs

David Brownell Sat, 13 Jan 2007 16:48:14 -0800

On Monday 08 January 2007 5:55 am, Phil Endecott wrote:

> Test 0-9 and 11-13 pass.  Test 14 fails:
> 
> # strace ./testusb -D /proc/bus/usb/004/010 -t14 -c 15000 -s 256 -v 1
> ..
> open("/proc/bus/usb/004/010", O_RDWR)   = 4
> ioctl(4, USBDEVFS_IOCTL, 0xbf8df42c)    = -1 EOPNOTSUPP (Operation not 
> supported)
> 
> I thought this had worked before, but maybe I am getting confused with 
> my g_zero testing; I believe it's not expected to work with gadgetfs 
> and usb.c, is this right?


Right, that sample program doesn't implement the control requests needed
to support test #14.  That combo is not expected to work.  (Which is not
to say it couldn't be _made_ to work on some platforms.)


> Anyway the current issue is test 10, which fails more often than not.  
> Perhaps it passes one time in three.

That's not quite what I understood from you before.  Intermittent failure
means that some race has opened up.  The point of test 10 is exactly to
turn up such races, by issuing lots of back-to-back requests.  It's rather
perversely good at doing that.  :)

I tried this on an at91 system would say that it worked a bit more often
than not, but it would indeed fail.  And based on what you show below,
it looks like a similar failure.


> I have tried to spot a pattern;  
> maybe it is more likely to work when usb.c is first started, more more 
> likely to fail later.  It has certainly been known to fail on the first 
> run though.  Here are the symptoms when it fails.  [Note that the 
> output below is not all from the same run.]
> 
> # ./testusb -D /proc/bus/usb/004/010 -t10
> (Never completes, no output)
> dmesg:
> 
> Jan  8 12:37:38 egypt kernel: usbtest 4-4:3.0: TEST 10:  queue 32 
> control calls, 1000 times
> (no further output)

Actually if you unplug the device at that point you should get a short
burst of additional diagnostic output, saying among other useful data
just which subtest failed.

Also, unless this is with UHCI, enabling USB debugging will produce
files /sys/class/usb_host/usb_hostN/async showing the schedule of USB
transactions.  What you could do is snapshot that file when this hang
happens ... the top few entries for the endpoint for this test will be
informative, showing exactly which subtest saw trouble.

(In the case of test #1, the actual failure would likely have been
triggered by the preceding request, or maybe the one before that...)


I'm not sure what to make of this usbmon stuff.  All the urbs are
submitted into a queue, so that the responses look like they're
out of order.  What's needed is a bit of postprocessing to associate
the responses with the requests.


> c7848500 33910869 S Ci:024:00 s 80 06 0400 0000 0009 9 <
> c7848260 33911110 C Co:024:00 0 0

subtest #7 (by inference; should have stalled).  I don't see
a response for c7848500, or a request for c7848260.

> c7848260 33911112 S Co:024:00 s 02 01 0000 0000 0000 0
> c7848380 33911117 C Ci:024:00 0 2 = 0000

subtest #8 (by inference; should have stalled)

> c7848380 33911118 S Ci:024:00 s 82 00 0000 0000 0002 2 <
> c7848560 33911124 C Ci:024:00 -121 39 = 09022700 010304c0 01090400 
> 0003ff00 0005
> 0705 87020002 00070503 02000201

subtest #9 (by inference, the next one being #10)

> c7848560 33911125 S Ci:024:00 s 80 06 0200 0000 0400 1024 <
> c78481a0 33911235 C Ci:024:00 -32 0

This "S" was obviously subtest #10, but the STALL response coming
back for c78481a0 doesn't match any URB submitted in the subset
of this log you included.

It's clear that after that STALL was issued, something got
confused in terms of handling the next request ... what ever
it was.  This is not uncommon ... fault paths are trouble,
and test #10 makes a point of triggering them.  It even makes
a point of triggering several fault paths in a row, so that
bugs in cleanup of one are able to trigger additional bugs
when processing later ones.  :)


> c78481a0 33911236 S Ci:024:00 s 80 06 0500 0000 0009 9 <
> 
> 
> On the device end, usb -v output is:
> 
> /dev/gadget/net2280 ep0 configured
> serial="13hz78wkvlc3179o5r9bj6rbm28wotubury5m1jhsqxas9xbceylqit6yvj0tqx"
> CONNECT high speed
> DISCONNECT
> read 2 ep0 events
> CONNECT high speed
> SETUP 80.06 v0300 i0000 255
> SETUP 80.06 v0302 i0409 255
> SETUP 80.06 v0301 i0409 255
> SETUP 80.06 v0303 i0409 255
> SETUP 00.09 v0003 i0000 0
> CONFIG #3
> simple_source_thread start -1209627728 fd 4
> simple_sink_thread start -1218020432 fd 5
> SETUP 80.06 v0304 i0409 255
> SETUP 80.06 v0305 i0409 255

Those are GET_DESCRIPTOR requests for strings 4 and 5 in US-English.


> Then these two lines are repeated:
> SETUP 81.0a v0000 i0000 1

Which is a GET_INTERFACE request; that gets bounced up to
userspace, unlike most of the test10 requests (which get
handled directly in the kernel).

> SETUP 80.06 v0300 i0000 9

Which is a GET_DESCRIPTOR request for string zero

> 
> In one failing case the lines were output 1312 times, in another case 
> 1495 times, then no further output.  In another case they were output 
> only once.  When the test passes they are output 2000 times.  Normally 
> the last line to be output is the 80.06 line, but I have seen a case 
> where it was the other one.

Test #10 issues a set of about 15 requests in loop, a few thousand times.
So each of those messages should appear once per loop.
 

> I have tried strace -f on usb.c and its output finishes like this:
> 
> [pid  2177] open("ep-a", O_RDWR <unfinished ...>
> [pid  2178] open("ep-b", O_RDWR <unfinished ...>
> [pid  2177] <... open resumed> )        = 4
> [pid  2178] <... open resumed> )        = 5
> [pid  2177] write(4, "[EMAIL PROTECTED]", 18 
> <unfinished ...>
> [pid  2178] write(5, "[EMAIL PROTECTED]", 18 
> <unfinished ...>
> [pid  2177] <... write resumed> )       = 18
> [pid  2178] <... write resumed> )       = 18

Configured the endpoints; context switching in the middle.

> [pid  2177] write(4, 
> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 7168 
> <unfinished ...>
> [pid  2178] read(5,  <unfinished ...>
> [pid  2176] <... sched_yield resumed> ) = 0
> [pid  2176] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> [pid  2176] read(3, "", 0)              = 0
> [pid  2176] --- SIGIO (I/O possible) @ 0 (0) ---
> [pid  2176] rt_sigreturn(0x3)           = 0
> [pid  2176] rt_sigtimedwait(~[RTMIN RT_1], 0, 0, 8) = 29
> [pid  2176] read(3, "\3\0\0\0\200\6\4\3\t\4\2\0", 60) = 12
> [pid  2176] write(3, "$\3", 2)          = 2
> [pid  2176] --- SIGIO (I/O possible) @ 0 (0) ---
> [pid  2176] rt_sigreturn(0x3)           = 2
> [pid  2176] rt_sigtimedwait(~[RTMIN RT_1], 0, 0, 8) = 29
> [pid  2176] read(3, "\3\0\0\0\200\6\5\3\t\4\377\0", 60) = 12
> [pid  2176] write(3, "\30\3S\0o\0u\0r\0c\0e\0/\0S\0i\0n\0k\0", 24) = 24

That was writing a control response ... 24 bytes, looks like
subcase #14, triggering a host-side short read.  That'd be
the GET_DESCRIPTOR case.


> [pid  2176] rt_sigtimedwait(~[RTMIN RT_1], 0, 0, 8) = 29
> [pid  2176] read(3, "\3\0\0\0\201\n\0\0\0\0\1\0", 60) = 12
> [pid  2176] write(3, "\0", 1)           = 1

This would seem to be the GET_INTERFACE case.


> [pid  2176] --- SIGIO (I/O possible) @ 0 (0) ---
> [pid  2176] rt_sigreturn(0x3)           = 1
> [pid  2176] rt_sigtimedwait(~[RTMIN RT_1],  <unfinished ...>

OK, that's interesting.  The failure I looked at on at91 was also
the GET_INTERFACE case (as I deduced by looking at the host side's
async message queue) ... and in that case, the byte never got sent
on the wire, since the host never saw it.

That's suggestive that the particular race in question is between
subtests 1 (gadgetfs returns config descriptor) and 2 (GET_INTERFACE).
Or at least _one_ of the races ... I'd guess there's a similar one
on the other request.

 
> I have also tessted with as823.patch reverted.  In this case I get:
> 
> # ./testusb -D /proc/bus/usb/004/032 -t10
> unknown speed /proc/bus/usb/004/032
> /proc/bus/usb/004/032 test 10 --> 121 (error 121)
> 
> Is this expected?

Yes.


> 
> Let me know what else I can do.

I'm thinking the failure is probably a race in how gadgetfs cleans
up after the (two) control requests handled in userspace.  You could
verify that hypothesis by collecting information about exactly which
requests fail (do the unplug, see what subtest dmesg says failed, say
if it's not one of the two we know gadgetfs handles), and how they're
failing (host side, does 'async' queue also show the response data never
arrived?).

What I'm going to do is look a bit more at what gadgetfs does when it
issues a control-IN response.

- Dave

 
> 
> Cheers,
> 
> Phil.
> 
> 
> 
> 
> 
> 
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> linux-usb-devel@lists.sourceforge.net
> To unsubscribe, use the last form field at:
> https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
> 

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Re: [linux-usb-devel] usbtest #10 with gadgetfs hangs

Reply via email to