Re: [linux-usb-devel] optimizing throughput

David Brownell Wed, 05 Feb 2003 20:15:17 -0800

Duncan Sands wrote:

On Tuesday 04 February 2003 22:53, David Brownell wrote:

Duncan Sands wrote:

(1) is there any speed advantage to submitting multiple urbs to
a bulk endpoint, as compared to using a single urb which is
resubmitted in its completion handler?

Yes, most noticeably at high speed where the time to report
the completion and resubmit the urb can easily be enough time
to transfer tens of kbytes.  For full speed devices, that same
delay would cost maybe 256 bytes of wasted throughput.


Hi Dave, thanks for your reply.  This is a full speed device
(speedtouch modem).  Hmmm, 256 bytes is about 4% for
me.  Is the above wastage estimate based on a completion
handler that does nothing but resubmit (i.e. zero wasted

256 bytes was about 1/3 frame on a lightly loaded bus; or
10-20K on a comparably loaded high speed bus.

The 1/3 frame number is realistic for the configurations I
had occasion to look at -- normal delays most devices will
routinely encounter over a day's use.

cycles)?  If there are other usb devices competing for
bandwidth, this will only get worse, right?

Here's where I have to say "do your own performance model"!

IRQ delays can shrink a bit, or even grow to several frames.
If the USB bandwidth had contention, that number could easily
shrink unless the resubmitted urb had bandwidth reserved for it.
And there's PCI contention too...

Plus there are other factors.  Different controller hardware
manages USB (and PCI) access differently.  If one latency
gets high enough, it might dominate and kick in some new
mode ... that VT8235 slowdown at high speed on 2.4 looks like
such an issue.

(Same thing applies to that folklore about when an urb that
you resubmit can actually start.  You can't know, so it's
better to avoid the question by having more than one urb in
the queue ... periodic 1 msec interrupt transfers on 2.4 can't
reliably deliver transfers at that rate, as irq delays rise.)

Passing N times through the code, rather than one, will take
more CPU time ... likely not not significant on its own, but
other factors may come into play.  Like queueing, the costs
of DMA mapping buffers (talk to IOMMU?), the costs of merging
into a single buffer, and so on.


I was thinking of the costs of setting up the DMA etc, i.e. USB subsystem
costs - are they significant?

On x86 and MIPS or PPC systems with dma-coherent caches,
they're insignificant.  And if you're not doing repetitive
I/O, you can't avoid the costs; maybe lessen them, by using
an IOMMU for scatterlist calls.  (Hmm, x86_64 uses the AGP
GART as an IOMMU during highmem pci dma...)

If you're doing repetitive calls, like data capture systems,
then it may be worth pre-allocating a dma buffer or, if you
can'd do it that way, manually mapping and synchronizing
between calls.

- Dave





-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Re: [linux-usb-devel] optimizing throughput

Reply via email to