On Wednesday 13 June 2007, Alan Stern wrote:
> Dave and everyone else:
> 
> The meaning of the return value from usb_unlink_urb() is not as clear
> as it should be.  The value is used in only a handful of drivers,
> mostly for debugging output.  David's code uses it for warnings and
> error messages, but not for altering the control flow.

It's unclear what should be done, other than print diagnostics.


> The real question is this:
> 
>       Under what conditions does it make sense to say that
>       usb_unlink_urb() has failed?

If it fails.  ;)


> Since the call races against URB completion, I think it never really
> makes sense.  Here's a more detailed examination of the possibilities.  
> Consider what happens when an URB is submitted:
> 
>       0. Initially the URB is unlinked and idle.
> 
>       1. The URB gets submitted and linked.
> 
>       2. The URB is passed to the HCD, which enqueues it.
> 
>       3. The URB completes and is dequeued by the HCD.
> 
>       4. The URB is unlinked and passed to the completion routine.
> 
>       5. The URB is once again idle (or it may be resubmitted).

That's the typical flow, yep.  No unlink/kill/etc.

The only difference between #4 and #5 is of course a refcount;
the URB becomes idle at some point on its way into the completion
routine, at which point it can be resubmitted.  The refcount is
kept (starting at #1) to ensure the URB is not prematurely freed.


> Now consider what happens if usb_unlink_urb() is called just after each
> step above:

The parenthesis indicating the unlink() result...


>       0. Nothing happens since the URB is idle.  (Error)

Since you say the after-#5 is the same as this, I'd not call
it an error. It's a fault, sure ... but any "error" comes from
a caller doing something that's defined to not succeed.  When
code does what it's defined to do, that's not an error.  :)


>       1. This is the tricky part.  We want the submission to fail
>          but it hasn't finished yet.  The URB is marked so that the 
>          HCD will recognize it has already been unlinked, and the
>          enqueue call will fail.  (Error?)

Either of two outcomes could work, preserving the invariant that
URBs are submitted exactly once and then completed either by normal
I/O completion or else by unlink:

  - Submit succeeds, unlink does too (returning -ECONNRESET or
    whatever);

  - Neither call succeeds ... submit is aborted (though probably
    for puzzling reason), unlink does since this becomes case #0.

The first of these makes a lot more sense to me, at this point,
since nothing another thread does "should" make the submit fail
unless the endpoint vanishes.  You seem to be assuming the other
path, which I'd normally file under "very surprising behavior
from a programming interface".

Now, ISTR that when implementing this the first time, there was
something preventing the sanest behavior here.  I forget the
details, but I know the relevant interfaces have since been at
least partially fixed up.


>       2. The HCD's dequeue routine works in the usual way.  (Success)

Standard behavior.


>       3. The HCD's dequeue routine fails since the URB has already
>          been dequeued.  (Error?)

The unlink shouldn't succeed, since the URB has completed in
every sense except the trivial one of finishing giveback().


Note that you're combining "completes and dequeues" into one
step, which doesn't really match my understanding.  The two
are can not be coupled atomic operations!!  First completion
happens (URB status becomes known, and is recorded).  At some
later point -- probably after recycling at least one TD, and
other cleanup of HCD state including internal queues -- the
URB starts its journey back to the driver.

As soon as the status became known, the HCD has committed to
returning the URB with that status, and the core "accelerate
completion" of unlink() could no longer kick in.  That "real
I/O status" should never be replaced by a false "unlinked"
status code.

And before that hardware status is detected, we're still in
the "after step #2" case.  It's unfortunate that sometimes
that means discarding useful hardware status reports... since
the unlink() status would override hardware status which might
not yet have been collected.  (Because for example maybe the
relevant IRQ was being held off by locking during the unlink.)


>       4. The unlink call fails since the URB is no longer linked.
>          (Error?)

As above for #3, except that the urb got further into its
journey back to the driver.


>       5. Same as 0.

Sounds right.


> 0 is clearly an error and 2 is clearly a success.  The others aren't so 
> clear. 

I disagree.  For #3 and #4 it's very clear:  the URB has
started to complete normally.  Since "unlink" is nothing
more than a way to kick in a software fault to accelerate
the URB completion, the unlink can't succeed:  there is
nothing to accelerate, it's already going back and with
some "real" status!  And as you say, #5 is #0... leaving
only #1 as potentially confusing.  (And that one hardware
race as an annoying #2 subcase, where hardware status must
get discarded.)


> In particular, in 1 can we say that the URB was unlinked when  
> strictly speaking it never got fully submitted to begin with?

If the submit() succeeds, then there are only two ways for
that URB to complete:  (a) normally due to I/O completion,
fault or normal; and (b) unlink, which is a fault triggered
by the software.

As I noted above, letting the unlink() prevent the submit()
from succeeding seems like a pretty dubious model.  But it
might have become less surprising due to the way kill() can
now change the URB lifecycle.


> But if  
> the unlink call hadn't been made then the submission would have 
> succeeded!  Part of the confusion results from the somewhat artificial 
> difference between being linked and being enqueued.

When I think at the USB driver level, there's only submitted().
Those other transitions are internal, and should not be exposed
to the drivers.


> IMO the fine distinctions above simply don't matter.  Drivers don't
> care anyway, since they can't know what stage the URB is at when they
> try to unlink it.

Just what I said.  ;)

 
> The only possible place where this might matter is in usbtest, which 
> specifically intends to test the unlink code paths.  Even there, 
> I'm not sure that usb_unlink_urb needs to return a meaningful value; 
> the URB's final status code ought to be sufficient.

Yes, we need to test those paths to make sure they don't break
or oops.  They used to do both.  ;)


> So -- is there any objection if I change usb_unlink_urb to return void?

The main reason to care is that drivers may need to know how
the URB will be completing ... since how (and whether) they
clean up will be affected.

That's what usbtest does, for example ... when it "loses" either
of the internal races (before HW gets the URB, or after I/O has
finished but before it's been handed back to the driver) it acts
differently ... the test can't terminate until a real unlink
happens (and shouldn't).  It never triggers cases #0/#5; the
case #2 is how the test exits.


> If there is, then consider that steps 0, 1, 3, and 4 ought to return
> distinct error codes, which they currently don't do.  (Or maybe 1
> should count as a success?)  In practice this would just make more work
> for drivers, since they would have to test more possible codes.

Well, surely #0/#5 should return the same fault code.
Those cases seem unlike the others, more of a "hey driver
writer, you screwed up" indicator.

And since #3 and #4 are essentially the same as each other,
they should return the same code too.  These are the cases
where the driver didn't screw up; just another minor race
(it's an async interface after all, races inherent), not any
kind of screwup.

If we assume that submit succeeds on #1, so that unlink()
can succeed, then it behaves like #2.  (I find the other
option for #1 confusing, so I won't address it.)  And #2
is a standard "unlink succeeded" outcome.

That makes three outcomes needed by async unlink:  error
because of driver screwup; success, accelerated completion
coming any moment now; fault due to race lost, normal
completion coming any moment now.

- Dave


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to