On Friday 15 April 2005 9:45 am, Oliver Neukum wrote:
> Am Freitag, 15. April 2005 17:13 schrieb Alan Stern:
> > 
> > The two main places where the driver disables interrupts for long periods 
> > are in the enqueue routine and the IRQ handler.

Well, that's where most of the time gets spent regardless.  There's
not a lot else an HCD does, normally:  send requests to the HC, and
receive their responses!


> > The results varied  
> > somewhat from run to run but they were all in the same ballpark.  The 
> > tables below show the number of events of each sort and their average and 
> > maximum times.  All times are in microseconds.  
> > 
> > On a Pentium IV running at 1.8 GHz:
> > 
> >             Count           Avg             Max
> > Enqueue:    1148            140             706
> > IRQ:        2175            101             746
> > 
> > On a Pentium II running at 350 MHz:
> > 
> >             Count           Avg             Max
> > Enqueue:    1227            511             3000
> > IRQ:        2769            248             1935

Could you summarize what tools you used to generate those numbers?
Like what kind of driver(s) were active, with what kind of loads.
Audio?  Storage?  Networking?  How about other statistics, like
minimum, mean, and standard deviation?

It'd also be interesting to compare them for OHCI and EHCI.  I'd
expect UHCI would be worse, because of the TD-per-packet thing,
but also having some common baselines would be good.

Interesting that this has twice as much many IRQs as URBs, and
that the P4 times for enqueue are disproportionately better.
Cache effects, maybe?

 
> > In my opinion, 740 us is a long time to leave interrupts disabled and 3 ms
> > is unacceptable.

Depends on the system, actually.  3msec does seem like a lot, but
it's not necessarily a problem.


> > People working on real-time systems often prefer to have 
> > interrupt latencies in the vicinity of 10-50 us.

They may prefer that, but they can also often live with much more;
the key point being _determinism/predictability_ more than any
single performance number.  And embedded folk will as a rule not
have CPUs as powerful as that P4.


> > It's true that other changes I have planned for the driver will reduce 
> > these values, although it's impossible to predict by how much.  However I 
> > think this gives a pretty good indication that splitting the driver into a 
> > top- and bottom-half is worth considering.
> 
> Why? The worst case is in enqueue.  Enqueing is not always interrupt
> driven. 

IRQ handling is though ... :)


> IMHO the best way to reduce times is to move all memory 
> allocations into urb allocation.

That's an approach I've thought about.  Unfortunately it's costs an invasive
API change:  passing the device (or better yet, usb_host_endpoint) into
the URB allocation.  Though to clarify:  that would affect allocation of
TDs and any urb-private data, not the data buffers.  Something like

   usb_urb_alloc(usb_host_endpoint *ep, // hook to HCD
                size_t maxbuf,          // ... for prealloc of N TDs
                unsigned n_iso,
                unsigned gfp_flags);

Heck, even just the usbcore/hcd hooks to let the HCDs cache a list of TDs
onto the URB would help, without needing any new API... so the invasive
changes could be invisible (at first) to device drivers.  TDs could be freed
to the per-urb list, and on some architectures (like x86) the re-enqueue
path might well be able to use cache-hot memory.

Alternatively, a per-endpoint cache of TDs might be even better ... less
invasive to usbcore.  That wouldn't help with urb-private data, but for
HCDs that need those it'd still just be a single kmalloc/free per submit.
That might facilitate addressing the UHCI-specific "lots of TDs" issue.
(By a scheme I once sketched:  only URBs to the front of the queue would
need TDs allocated, and as TDs get freed they could be mapped onto URBs
towards the end.  That'd put a ceiling on the enqueue costs, which is a
fine thing from real-time perspectives...)

Of course, it'd still be good to measure just _how_ the time gets spent.
I'm pretty sure that for OHCI and EHCI most of the cost is TD alloc,
more than setup or queue activation.

- Dave




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to