I hate top-posting, but I feel I must reply to David Brownell's latest mail in that way so as to convey some good news first, then go into the details.
Basically, David, your suggestion of "try small URBs" works! I am now seeing 40 fps from my camera with URBs of 4K, 8K and 16K. So I'd say that your first hypothesis below pans out nicely: On Tue, 2005-02-15 at 23:09, David Brownell wrote: > On Monday 14 February 2005 9:17 am, Steve Hosgood wrote: > > > > qh/cf49c100 dev2 hs ep2 42002202 40000000 (8a00ad80* data1 nak0) > > da98f360 in len=0 00004d00 urb d6189b00 > > da98f480*in len=12288 b0008d80 urb d6189b00 > > da98f420+in len=20480 50000d80 urb d4452680 > > da98f540 in len=12288 b0008d80 urb d4452680 > > da98f4e0#in len=20480 50000d80 urb d6189900 > > da98f600 in len=12288 b0008d80 urb d6189900 > > ... > > > > Now for the non-working situation of 800x600 images @ 40fps: > > And this is "non-working" in that you verified, using a > CATC or equivalent, that the host wasn't sending any IN > tokens to the device at all? > I don't currently have access to a USB analyser, but last time I did have one, then yes, the host wasn't sending IN tokens. > > qh/cf49c100 dev2 hs ep2 42002202 40000000 (0400ad80* data0 nak3) > > da98f360 in len=0 00004d00 urb d6189b00 > > da98f480*in len=12288 b0008d80 urb d6189b00 > > da98f420+in len=20480 50000d80 urb c2609a80 > > da98f540 in len=12288 b0008d80 urb c2609a80 > > da98f4e0#in len=20480 50000d80 urb d6189a00 > > da98f600 in len=12288 b0008d80 urb d6189a00 > > The thing that's obviously striking here is that your urbs > have 32KB buffers, so each one stretches into two qTDs: > the first is five pages, the second is three. (And nothing > looks obviously wrong in those parts of the queue heads.) It's an odd split: why not 4 pages + 4 pages? > Now while that's normally a fine thing, it's also a mode > that doesn't usually kick in ... AND it's one where some > shenanigans have to be used to patch up short reads. AND > you're seeing problems after some short reads. > > Specifically, when the first TD (len=20480) triggers any > kind of short read, it enters a special patch-up mode > (flagged by the '#') where instead of going to the next > TD, it goes to a magic "alternate" dummy that just stops > the whole queue ... and the queue scanning logic has to > detect that magic dummy, and restart it. > > ==> HYPOTHESIS: there's a bug in how the queue scanning > handles that special case. It's very rare; QED. > > ==> PLEASE TEST BY: trying buffers of no more than 5KB > in your bulk-IN urbs. This will completely avoid > those special cases. > Well - as stated right at the top of this posting, you're obviously quite right. I've tested with 4K buffers (less than 5K as you suggested), but also with 8K and 16K buffers. It works fine in all cases. I hope that the extra clues given by those facts might help you to home in on the true underlying problem in the queue handling. > If it checks out, there can be a couple ways to fix it. > The "slow" way would just identify how that special casing > is goofed, and resolve the issue. The "fast" way would be > to use that "alternate" in a smarter way, so the hardware > immediately jumps to the next URB. (And change how the > single-URB unlink logic works, too.) > > Also > > ==> HYPOTHESIS: this is more hardware that behaves oddly > with the "park" mode. A recent patch disabled that, > improving behavior on some NForce2 boards. > > ==> TEST BY: using 2.6.11-rc4, which turns off that > park mode. > I don't think I can do that today as I'm out of space on my hard disk (doh!) and a bigger replacement is on order but hasn't turned up (yet). If it turns up later today then of course I'll test with 2.6.11-rc4 for you. Do you actually care any more, now that hypothesis #1 seems to have been correct? > In general, reporting test results with a vendor kernel is > not going to be as useful as reporting ones with the most > current www.kernel.org code ... unless you're dealing with > the vendor. > Sorry. I'll keep that in mind should there be a "next time". ----------- In the meantime, might I say *thank you* to everyone on the list whose contributions have gone into tracking down and finding this bug. I suspect a proper fix for this anomaly will now be written in a matter of hours and Linux's USB support will be the stronger for it. For now, I'll live with the 16K URBs, but I will of course be more than happy to compile and test proposed proper fixes for this "big URB/small bulk reads" bug as they appear. Steve Hosgood ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel