On Fri, 15 Apr 2005, David Brownell wrote: > On Friday 15 April 2005 9:45 am, Oliver Neukum wrote: > > Am Freitag, 15. April 2005 17:13 schrieb Alan Stern: > > > > > > The two main places where the driver disables interrupts for long periods > > > are in the enqueue routine and the IRQ handler. > > Well, that's where most of the time gets spent regardless. There's > not a lot else an HCD does, normally: send requests to the HC, and > receive their responses!
Certainly. However at the moment I'm not so much concerned about where time is spent in general; I'm more concerned about how long interrupts are left disabled. > > > On a Pentium IV running at 1.8 GHz: > > > > > > Count Avg Max > > > Enqueue: 1148 140 706 > > > IRQ: 2175 101 746 > > > > > > On a Pentium II running at 350 MHz: > > > > > > Count Avg Max > > > Enqueue: 1227 511 3000 > > > IRQ: 2769 248 1935 > > Could you summarize what tools you used to generate those numbers? > Like what kind of driver(s) were active, with what kind of loads. > Audio? Storage? Networking? How about other statistics, like > minimum, mean, and standard deviation? No special measures were taken. This was done on two ordinary workstations. Networking was up on the P4 but not on the P2. No user programs running other than the shell and the normal background daemons, none of which did any USB activity (in particular haldaemon was off). I used usb-storage (with debugging turned off, although that shouldn't matter much). The P4 has EHCI controllers but ehci-hcd wasn't loaded -- otherwise the test device wouldn't have used uhci-hcd! (A lot of kernel debugging features, like cache poisoning, were turned on since that's how I normally do my development. They may have had a significant impact.) I didn't keep any statistics other than what you see above, and I only ran the test a few times. It's possible that the numbers are incorrect because, as I realized later, I stored the initial timer value immediately before calling spin_lock_irqsave instead of immediately after. I can do it over again if you want. > It'd also be interesting to compare them for OHCI and EHCI. I'd > expect UHCI would be worse, because of the TD-per-packet thing, > but also having some common baselines would be good. Would you like to see my test code? I'll send it to you off-list if you want -- not because it's big but because it's so ugly. It should be easy enough to adapt it to OHCI and EHCI. > Interesting that this has twice as much many IRQs as URBs, and > that the P4 times for enqueue are disproportionately better. > Cache effects, maybe? I don't know the answer to either question, although it should be possible to find out why there are so many more IRQs than URBs. > > > In my opinion, 740 us is a long time to leave interrupts disabled and 3 ms > > > is unacceptable. > > Depends on the system, actually. 3msec does seem like a lot, but > it's not necessarily a problem. As you say, it depends on the circumstances. For desktop use it's probably okay, mostly. In other situations it would be bad. > > > It's true that other changes I have planned for the driver will reduce > > > these values, although it's impossible to predict by how much. However I > > > think this gives a pretty good indication that splitting the driver into > > > a > > > top- and bottom-half is worth considering. > > > > Why? The worst case is in enqueue. Enqueing is not always interrupt > > driven. > > IRQ handling is though ... :) The point is not whether things are interrupt-driven, it's whether or not interrupts are enabled. In a bottom-half handler all the time-consuming work can be done with interrupts enabled. > > IMHO the best way to reduce times is to move all memory > > allocations into urb allocation. > > That's an approach I've thought about. Unfortunately it's costs an invasive > API change: passing the device (or better yet, usb_host_endpoint) into > the URB allocation. Though to clarify: that would affect allocation of > TDs and any urb-private data, not the data buffers. Something like > > usb_urb_alloc(usb_host_endpoint *ep, // hook to HCD > size_t maxbuf, // ... for prealloc of N TDs > unsigned n_iso, > unsigned gfp_flags); > > Heck, even just the usbcore/hcd hooks to let the HCDs cache a list of TDs > onto the URB would help, without needing any new API... so the invasive > changes could be invisible (at first) to device drivers. TDs could be freed > to the per-urb list, and on some architectures (like x86) the re-enqueue > path might well be able to use cache-hot memory. I'm not sure what would be the best/easiest approach. Preallocating TDs may not be good if the URB is going to live for a long time. And it's not clear how much of the time for enqueue is spent _allocating_ the TDs as opposed to _preparing_ them. > Alternatively, a per-endpoint cache of TDs might be even better ... less > invasive to usbcore. That wouldn't help with urb-private data, but for > HCDs that need those it'd still just be a single kmalloc/free per submit. > That might facilitate addressing the UHCI-specific "lots of TDs" issue. > (By a scheme I once sketched: only URBs to the front of the queue would > need TDs allocated, and as TDs get freed they could be mapped onto URBs > towards the end. That'd put a ceiling on the enqueue costs, which is a > fine thing from real-time perspectives...) This is one of those changes I mentioned earlier. It shouldn't be necessary to have more than, say, 500 TDs allocated for an endpoint at any time. That's about 26 ms worth, or 31 KB of data. So long as a completion interrupt is issued every 200 TDs, it should work fine. Alan Stern ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel