On Monday 11 April 2005 12:07 pm, Alan Stern wrote:
> On Mon, 11 Apr 2005, David Brownell wrote:
> 
> > >   * Host Controller Drivers (HCDs) place all the URBs for a particular
> > >   * endpoint in a queue.  Normally the queue advances as the controller
> > > ...
> > > + * hardware processes each request.  But when an URB terminates with an
> > > + * error its queue stops, at least until that URB's completion routine
> > > + * returns.  It is guaranteed that the queue will not restart until all
> > > + * its unlinked URBs have been fully retired, with their completion
> > > + * routines run, even if that's not until some time after the original
> > > + * completion handler returns.  Normally the same behavior and guarantees
> > > + * apply when an URB terminates because it was unlinked; 
> > 
> > That "normally" bit restates the guarantee provided above.
> > 
> > The redundancy makes things less clear (IMO) because it's
> > weakening the guarantee that "when an URB terminates with
> > an error" (including an unlink or kill status!!) its queue
> > stops ... that says "always", this says "sometimes maybe".
> 
> That's just it.  The first part applies only when an URB completes with a
> hardware error, _not_ an unlink or kill! 

It was intended to cover _all_ fault reports.  Unlink/kill is just
a software-induced fault.


> Do you think this needs to be 
> made more clear?  The second part applies when an URB terminates because
> of an unlink or kill.  It doesn't weaken the first guarantee because it
> refers to different circumstances.

Well, lately I've taken to talking about "faults" as an umbrella
covering both hardware-induced stoppage and software-induced
unlink/kill.  In fact that's what the original text said ...
which also addresses Oliver's comment.


> > > + *                                                        however if an 
> > > + * URB is unlinked before the hardware has started to execute it, then
> > > + * its queue is not guaranteed to stop until all the preceding URBs have
> > > + * completed.
> > 
> > Well, that's because no urb before that one will have
> > completed with any fault code, so there was no need
> > for the queue to have stopped.
> 
> True.

And "normally" the action of an HCD for an unlink request must be
to deschedule/stop that endpoint's queue...

So there are three cases, depending on the queue state.  Always
have been, in fact; can't really change anything.

(1) Queue stopped before HC touched that URB.  This is the
    cleanest case.  Any preceding requests will be processed
    normally ... e.g. completing with success, or possibly
    because of some problem the hardware reported.  And that
    URB will get a clean unlink report, actual_length 0.

(2) Queue stopped part way through that URB (0..transfer_length
    bytes transferred).  Same thing, except that the URB being
    unlinked may have nonzero actual_length in the completion.

(3) (Unlink RACE case) Queue didn't manage to stop before the
    hardware fully processed that urb, and processed the whole
    transfer.  There are two subcases here:  (a) queue stopped
    before that unlink completion was issued, so again the
    original text should be true; (b) queue still active, which
    is the only problem case.

See below for (3b).  For all other cases, the original statement
is true ... and the (3b) problem has always existed.

I suspect a lurking issue here may really be that UHCI still
has some issues with how it handles unlinking, which may not be
fixable until it actually has a solid notion of endpoint queue.
(Call that leftover 2.4-ism, if so.  The whole notion of even
being able to unlink _just one_ request is very dubious; if you
look at the USB spec, it talks about killing _all_ IRPs at once.)

The downside of that would be potential loss of integrity of
the usb_sg_*() queues on fault paths ... hard to test, and also
fortunately rare to be real trouble.


> >  The relevant urb was not yet terminated.
> 
> False.  The relevant (i.e., the unlinked) URB is terminated by the time
> its completion handler gets called.

Well, its termination is the same thing as the completion handler
being called.  The request is pending until that point, and is
no longer pending after that.


> This may well occur before all the 
> preceding URBs have terminated, since completion handlers are allowed to
> run out of sequence when an URB is unlinked.

To safely run that out of sequence, its queue clearly stopped in some
sense.  If there's a hardware DMA queue (as with all the PCI based
HCs), that queue must have been stopped in order to safely take that
URB out of the queue.  That's just because of how DMA works; there
can be lots of caching involved, and the HCD must handshake with the
HC to ensure that URB isn't being worked on.  The only reliable way
to do that is ensure that _all_ urbs in that queue aren't being worked.

For PIO based HCs, there are two basic cases: (a) this URB has a pending
transfer that must be completed first; or (b) it's later in the queue,
and can be removed immediately.  The first case is effectively just like
coping with DMA I/O caching; while the second is a useful simplification,
since the HCD can know (unlike with DMA queues!) that it's safe.

(And FWIW, most of the current UDC drivers work more like the PIO case
there.  Only the net2280 driver has DMA queue support, and while that
worked fine for IN/TX it was a real bear for OUT/RX ... so it's not
normally used.  The DMA drivers don't use DMA queues.)


> According to the original wording the queue _would_ be stopped when the
> unlinked URB's completion handler was called, even though the hardware
> might not have reached the URB yet.

Yes.  How can you avoid stopping the queue in that case?  If the hardware
is chewing through the queue, the _only_ way to control it is to stop
that endpoint queue.  Otherwise the HC could be munching away on that
very transfer while the HCD was handing it back to some driver...


> The old text said: 
> 
>       But when an URB terminates with any fault (such as an error, or 
>       being unlinked) its queue stops...
> 
> So the new text does describe different behavior from the old text.

OK, so that new text is wrong then.  The only problem case has ever
been (3b), where the URB completed (at the hardware level) before the
queue could be stopped.

I don't see a clean way to handle (3b), but on the other hand it
doesn't matter with current drivers.  All code that unlinks one
URB is intending to nuke the whole queue (remember that issue?):

  * usb_sg_cancel() does it back-to-front and goes to some effort to
    ensure that there's a clean abort after the first N bytes, no
    matter when exactly the hardware stopped working on the queue;

  * usbnet doesn't do back-to-front (for no good reason) but in that
    case it doesn't matter ... packets can be dropped or reordered
    by definition.

So I'm not too worried about case (3b), although I think it should
get fixed someday ... by removing the "unlink one URB" semantics,
replacing them with a "kill this endpoint queue" request which is
not "buggy-as-designed" (because of a hardware race in the API).

- Dave



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to