Sounds like your CDC hardware isn't as fast as your CATC hardware, or
else the CDC code is doing some strange things to slow transfers down.
You're getting half the throughput you should get on 10BaseT!

I didn't think the CATC code was using USB queueing ... though it does
seem to avoid stopping the network queue in most cases, which should
have some performance benefit.  (Read about the 2.4 "softnet" changes.)


I'm guessing that this throughput difference is at least partly due to not using magic called "bulk queuing". I didn't pay enough attention when the implementation was being discussed, and google isn't showing me any summary explanations.

Can someone provide a quick "once over" on the topic, including any "gotchas"?
Look at the kernel doc for usb_submit_urb(). Technically, you just
submit a bunch of urbs to an endpoint and they'll get processed in
order, without your driver needing to hand-hold between transfers.
Not unlike most other driver frameworks.

As for "gotchas", the only one that comes to mind is that when you
get errors that stall the endpoint (-EPIPE) your driver should be
ready to unlink all pending urbs (in the completion handler!); that
issue doesn't come up without queueing. All drivers should then
call usb_clear_halt() from some thread, perhaps schedule_work(),
before doing I/O again -- queueing doesn't change that.


If you don't queue, then between each urb from your driver and the next,
the CPU has to wait for:

- an IRQ to be issued (average of 1/2 millisecond at full/low speeds),
- the host CPU to get around to the IRQ then hand it to the HCD,
- the HCD to process any completed TDs that are ahead of that URB's,
- issue completions to the device driver for the URs that finished
(not necessarily in the IRQ handler, maybe in a tasklet).

Add that up, and you'll see that on average there's a significant delay
between urbs that's guaranteed if you're not queueing urbs. That adds
to latency, and wastes bandwidth if there's any kind of load. At high
speed, the "opportunity cost" is higher, at least measured in terms of
bandwidth. (Latency is slightly tunable by an EHCI driver parameter.)

On the other hand, if the urbs are queued, those delays all get moved
out of critical paths -- they don't block I/O.

It's easy to understand if you focus on short network packets, maybe
ones that just take two full speed USB packets: with queueing, you can
exchange nine (and a half) such network packets per frame, else it's just
one per frame. (Or at high speed they'd take part of one bulk packet,
forty per microframe versus just one...) For bigger packets there's a
smaller win, though at higher speeds (100BaseT full duplex over USB 2.0!)
the difference will be significant even for big packets.

- Dave




-------------------------------------------------------
This sf.net email is sponsored by: Are you worried about your web server security? Click here for a FREE Thawte Apache SSL Guide and answer your Apache SSL security needs: http://www.gothawte.com/rd523.html
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to