On Tue, 20 Jan 2004, David Brownell wrote:
At one point I noticed schedule glitching if the URB queue got short enough to empty; for now, that'll force a big (10 msec!) gap in the stream. If that's an issue, the fix is using deeper queues in your host side driver.
Forgive my jumping in here...
Contributions are always welcome here! Especially from folk who are using the high bandwidth modes; so far off-the-shelf devices don't try to do such stuff. So the testing is harder than usual.
I found that 'deeper' is actually limited in its effectiveness.
I found (w a CATC) @ 2Kb in and 2Kb out per uframe and preloading the frame list by starting w a large enough start_interval, I also got a 'big' gap in the stream.
This gap was because the callback layer couldn't clear them fast enough.
Could you elaborate a bit more about your queue setup, notably how big each urb's buffer was and how many you had queued?
And also, how much work you were doing in your urb completion callbacks ... rather than in a tasklet?
I certainly noticed that doing much printk() in completion routines -- which ran about every (11/8) milliseconds in many of the tests I did, sometimes more often -- could make big trouble in the high bandwidth modes.
It occurred to me that too much is probably being done in_interrupt() so I thought I'd try making scan_periodic merely offload the list to a queue and at end of scan, schedule a tasklet to drive complete/callback logic. Thus callback runs at the taskelet level rather than in_interupt().
I'd recommend you do that in your device driver, instead. Your completion callback can use urb->urb_list (so long as the URB hasn't been submitted!) and schedule the tasklet.
I don't think the schedule scanning should actually be very costly ... though there is an optimization that could stand to be applied. (See the FIXME at the end of ehci-sched.c about inching up.) It's possible that the URB completion code paths could be made shorter, or cache-friendlier.
I've been meaning to oprofile this, but haven't yet done so.
Also I'm not sure how a tasklet _alone_ could change things. The data is streaming at the same rate regardless of what the CPU is doing, and the CPU would take the same amount of time (ignoring cache effects) regardless.
I'm not quite finished w it yet, but reducing the in_interrupt() ~path and adding kernel robustness by removing driver callback code from the interrupt path seemed like intrinsically good things to do anyway.
Your thoughts/comments?
In fact, that's how the EHCI driver worked in its first incarnation! The hardware is friendly to that approach; it could run entirely without IRQs, using just timer-based polling.
However, Dave Miller wanted to get rid of that to make some of the keyboard/HID integration work sanely on SPARC. And that made for some simplifications; I agreed, and here we are. (See the BK history from early December 2002.)
I'd rather avoid going back to that strategy if we don't need to. As for "need", I like to consider myself open to evidence ...
- Dave
------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
