On Sun, 17 Apr 2005, David Brownell wrote:

> > I sense a common theme in these comments!  You're clearly concerned about 
> > undue delays during submission.
> 
> Among other things.  As I've pointed out, (non-PIO) HCDs basically do
> only two things:  they package URBs for HC DMA hardware, then they process
> notifications from the HC to issue completions for those URBs.  Anything
> else is, statistically, noise where correctness (including lack of any
> data stream corruption) is all that matters.
> 
> Or said differently, there are only two critical paths:  putting requests
> from driver submit onto the DMA queue; then later taking completed requests
> off the remnant of that queue (HC's forgotten them) and giving them back.

I can't argue with any of this.  Submission latency is the one aspect of 
the proposal I don't like.

> I just had a thought:  maybe one of the reasons Microsoft has such big
> per-request latencies is that they're using something analgous to tasklets.
> It's always puzzled me why they go to such effort to batch networking
> requests, for example, especially when the handful of Linux-vs-Windows
> comparisons I've seen on the same hardware shows that Linux gets better
> throughput _without_ needing batching.  It could just be more evidence
> that TCP-offload architectures aren't wins ... or it could be just a
> not-so-good consequence of a particular USB design tradeoff.

Does MS really have such large per-request latencies?  How does one get
hard numbers?

> > I don't see any way around it, though.  Yes, I would like to take a lot of 
> > the stuff now done with interrupts disabled (which isn't quite the same as 
> > under the spinlock) and let them run with interrupts enabled.  Remember, 
> > however, that ->enqueue() can be called with interrupts already disabled!  
> 
> So?  We've talked about other ways to reduce the amount of time spent
> with IRQs disabled.  (Including not using memory poisoning, and other
> reductions of TD allocation costs.  As well as less invasive ways to
> reduce TD allocations.)

I'm convinced to the extent that I'll hold off on the tasklet until after 
these other changes have been made.  If they can improve the timings 
enough that there's no need for a tasklet, then so much the better.


> That conclusion isn't "necessary".  It'd increase the latency for getting
> requests onto the DMA queue.  At full (or low!) speeds that's unlikely
> to affect throughput very much ... but it would negatively affect resubmit
> paths for periodic transfers, as well as normal submit paths.

You're wrong about resubmit paths, see below.  But yes, it will increase 
the latency for new requests.

> So at any rate, interrupt transfers would be incurring TWO additional
> tasklet-induced latencies, which could be driver-visible.  One between
> IRQ and tasklet running, for the completion.

I don't believe this delay would be very long.  Aren't tasklets run every
time the processor returns from an interrupt?  That's not going to take
much time.  Only the time needed to run higher-priority tasklets, which 
seems reasonable to me.

> Although I've always recommended that periodic transfers maintain queues
> on the endpoint if they're going to defend against increased latencies,
> usually it's just ISO drivers doing that.  Any driver that potentially
> uses interrupt transfers with fast transfer intervals could notice the
> difference if both completion and submit processing go to tasklets, which
> increases software-visible latencies.

The delay for resubmission by a completion handler also won't be very 
long.  After all, the completion handler is called _by the tasklet_, so 
the tasklet is already running.  There won't be any overhead for starting 
it up.


> > > The correct pt_regs -- current as of the IRQ?  I thought they
> > > were specific to that IRQ context.
> > 
> > I'm not sure about this.  My understanding, which could be wrong, is that
> > pt_regs contains nothing more than the register contents at the time the
> > interrupt was fielded. 
> 
> Last I looked, it was a pointer onto an IRQ stack.  So it'd be
> invalid during a tasklet; you can't usefully save it, even for
> the same CPU.

The argument to the completion handler is a pointer to struct pt_regs;  
the struct pt_regs itself is the register contents.  There's no reason the
driver can't make its own copy of the struct pt_regs and pass a pointer to
the copy to a completion handler later on.  (Not a copy of the original
pointer; that would be useless, as you said.)


> > The point of my question was this.  Suppose the queue is currently 20
> > deep, with the hardware working on the first URB, when a driver unlinks
> > the fifteenth.  If the HCD takes that URB off the schedule and calls the
> > completion routine while the hardware is still way back on the first or
> > second URB, why should the driver care if the queue is still running?  
> > How could it even tell? 
> 
> By data stream corruption when the hardware keeps processing the
> requests.  Yes, tricky to detect directly, and not common ... the
> point of the current (pre-patch) API was to preclude such trouble

But if the HCD works correctly there won't be any data stream corruption.  
Certainly processing URBs 1-14 won't cause corruption, if URB 15 was 
the one that got unlinked.

> > All it cares about is that the queue should stop 
> > before the hardware reaches URB 16.  If the HCD can make that guarantee
> > without stopping the queue, why shouldn't it?
> 
> "If the queue can be stopped without stopping the queue..." ????

The idea I'm trying to get across here is that the queue doesn't have to 
be stopped _at the time the completion handler runs_.  It suffices to stop 
the queue any time before the hardware reaches the unlinked URB.  If the 
HCD can do so safely, and if that time doesn't occur until after the 
completion handler has finished, then the handler can run while the queue 
is active.

That's what my documentation change was intended to convey.  The old text 
said that the queue would be stopped when the handler runs.  The new text 
gives the HCD more leeway; it doesn't have to stop the queue ASAP but only 
when stopping becomes necessary.


> > That's what I said: Don't report the fault completion right away when an
> > URB is unlinked; hold it back until the queue stops and all the unlinked
> > URBs are returned.  Only the "don't report right away" part is new.  The
> > existing code already takes care of completion reports when the queue
> > stops; no additional mechanism is needed there (contrary to what you
> > said).
> 
> But there _is_ a new mechanism you want to require, as you said.
> Holding back all fault completions iff an unlink is pending.

That's right.  Your case 3b needs something like this if it is to follow 
the API's guidelines.  The way the HCDs currently handle it doesn't agree 
with either the old or the new documentation.

Alan Stern



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to