On Thu, 14 Jun 2007, David Brownell wrote:

> On Wednesday 13 June 2007, Alan Stern wrote:
> > Dave and everyone else:
> > 
> > The meaning of the return value from usb_unlink_urb() is not as clear
> > as it should be.  The value is used in only a handful of drivers,
> > mostly for debugging output.  David's code uses it for warnings and
> > error messages, but not for altering the control flow.
> 
> It's unclear what should be done, other than print diagnostics.

Quite.  That was what made me think of eliminating the return code.

> > Since the call races against URB completion, I think it never really
> > makes sense.  Here's a more detailed examination of the possibilities.  
> > Consider what happens when an URB is submitted:
> > 
> >     0. Initially the URB is unlinked and idle.
> > 
> >     1. The URB gets submitted and linked.
> > 
> >     2. The URB is passed to the HCD, which enqueues it.
> > 
> >     3. The URB completes and is dequeued by the HCD.
> > 
> >     4. The URB is unlinked and passed to the completion routine.
> > 
> >     5. The URB is once again idle (or it may be resubmitted).
> 
> That's the typical flow, yep.  No unlink/kill/etc.
> 
> The only difference between #4 and #5 is of course a refcount;
> the URB becomes idle at some point on its way into the completion
> routine, at which point it can be resubmitted.  The refcount is
> kept (starting at #1) to ensure the URB is not prematurely freed.

Right.

> > Now consider what happens if usb_unlink_urb() is called just after each
> > step above:
> 
> The parenthesis indicating the unlink() result...
> 
> 
> >     0. Nothing happens since the URB is idle.  (Error)
> 
> Since you say the after-#5 is the same as this, I'd not call
> it an error. It's a fault, sure ... but any "error" comes from
> a caller doing something that's defined to not succeed.  When
> code does what it's defined to do, that's not an error.  :)

"Error" in the sense of returning a negative errno value.  But I agree
that "fault" is a better term.

> >     1. This is the tricky part.  We want the submission to fail
> >        but it hasn't finished yet.  The URB is marked so that the 
> >        HCD will recognize it has already been unlinked, and the
> >        enqueue call will fail.  (Error?)
> 
> Either of two outcomes could work, preserving the invariant that
> URBs are submitted exactly once and then completed either by normal
> I/O completion or else by unlink:
> 
>   - Submit succeeds, unlink does too (returning -ECONNRESET or
>     whatever);
> 
>   - Neither call succeeds ... submit is aborted (though probably
>     for puzzling reason), unlink does since this becomes case #0.

Agreed.

> The first of these makes a lot more sense to me, at this point,
> since nothing another thread does "should" make the submit fail
> unless the endpoint vanishes.  You seem to be assuming the other
> path, which I'd normally file under "very surprising behavior
> from a programming interface".

Yes, it could be confusing.  On the other hand, it occurs under 
confusing circumstances: when an unlink is unsynchronized with a submit 
and arrives at just the wrong time.  In a way this falls under the "you 
get what you deserve" heading...

> Now, ISTR that when implementing this the first time, there was
> something preventing the sanest behavior here.  I forget the
> details, but I know the relevant interfaces have since been at
> least partially fixed up.

By now the interfaces certainly ought to be in good shape!

> >     2. The HCD's dequeue routine works in the usual way.  (Success)
> 
> Standard behavior.
> 
> 
> >     3. The HCD's dequeue routine fails since the URB has already
> >        been dequeued.  (Error?)
> 
> The unlink shouldn't succeed, since the URB has completed in
> every sense except the trivial one of finishing giveback().
> 
> 
> Note that you're combining "completes and dequeues" into one
> step, which doesn't really match my understanding.  The two
> are can not be coupled atomic operations!!  First completion
> happens (URB status becomes known, and is recorded).  At some
> later point -- probably after recycling at least one TD, and
> other cleanup of HCD state including internal queues -- the
> URB starts its journey back to the driver.

Although the two events aren't truly atomic, they generally occur under
the protection of a single spinlock.  That is, while holding the
spinlock the HCD learns that the URB is complete, and then while still
holding the lock it dequeues the URB and gets ready to send it back to
the driver.  That's close enough to atomic for my purposes.

> As soon as the status became known, the HCD has committed to
> returning the URB with that status, and the core "accelerate
> completion" of unlink() could no longer kick in.  That "real
> I/O status" should never be replaced by a false "unlinked"
> status code.

That's another part of the API which has never been made clear.  I 
wouldn't be surprised if HCDs were inconsistent in this regard.  As 
part of my changes, I'll insure that unlink doesn't change the status 
of a completed URB.

> And before that hardware status is detected, we're still in
> the "after step #2" case.  It's unfortunate that sometimes
> that means discarding useful hardware status reports... since
> the unlink() status would override hardware status which might
> not yet have been collected.  (Because for example maybe the
> relevant IRQ was being held off by locking during the unlink.)

The unlink status doesn't have to override hardware status reports, not 
even ones which haven't yet been collected.  If there is a relevant IRQ 
pending, the giveback must not occur until the IRQ has been serviced -- 
in which case the status from the hardware would be available in time 
to override the unlink status.

> >     4. The unlink call fails since the URB is no longer linked.
> >        (Error?)
> 
> As above for #3, except that the urb got further into its
> journey back to the driver.
> 
> 
> >     5. Same as 0.
> 
> Sounds right.
> 
> 
> > 0 is clearly an error and 2 is clearly a success.  The others aren't so 
> > clear. 
> 
> I disagree.  For #3 and #4 it's very clear:  the URB has
> started to complete normally.  Since "unlink" is nothing
> more than a way to kick in a software fault to accelerate
> the URB completion, the unlink can't succeed:  there is
> nothing to accelerate, it's already going back and with
> some "real" status!  And as you say, #5 is #0... leaving
> only #1 as potentially confusing.  (And that one hardware
> race as an annoying #2 subcase, where hardware status must
> get discarded.)

Okay.

> > In particular, in 1 can we say that the URB was unlinked when  
> > strictly speaking it never got fully submitted to begin with?
> 
> If the submit() succeeds, then there are only two ways for
> that URB to complete:  (a) normally due to I/O completion,
> fault or normal; and (b) unlink, which is a fault triggered
> by the software.
> 
> As I noted above, letting the unlink() prevent the submit()
> from succeeding seems like a pretty dubious model.  But it
> might have become less surprising due to the way kill() can
> now change the URB lifecycle.

The reason for using the "both fail" approach is that it's much 
simpler.  A single test can fail the submit right at the beginning, 
with no need for a spurious code path which simulates a submit 
followed by an immediate giveback.  There's less to go wrong and it 
runs more quickly.

One way to reduce drivers' confusion would be to have the submit call
for the pre-empted URB return -EPERM.  That's what usb_kill_urb would 
normally cause to happen and so many drivers are accustomed to it.

Drivers calling usb_unlink_urb from a different thread, not 
synchronized with URB submission, ought to expect strange things to 
happen occasionally.  If documented properly it should be acceptable.

> When I think at the USB driver level, there's only submitted().
> Those other transitions are internal, and should not be exposed
> to the drivers.
> 
> 
> > IMO the fine distinctions above simply don't matter.  Drivers don't
> > care anyway, since they can't know what stage the URB is at when they
> > try to unlink it.
> 
> Just what I said.  ;)
> 
>  
> > The only possible place where this might matter is in usbtest, which 
> > specifically intends to test the unlink code paths.  Even there, 
> > I'm not sure that usb_unlink_urb needs to return a meaningful value; 
> > the URB's final status code ought to be sufficient.
> 
> Yes, we need to test those paths to make sure they don't break
> or oops.  They used to do both.  ;)
> 
> 
> > So -- is there any objection if I change usb_unlink_urb to return void?
> 
> The main reason to care is that drivers may need to know how
> the URB will be completing ... since how (and whether) they
> clean up will be affected.

I don't know of any examples other than usbtest.  Besides, it's not the
unlink code that cleans up the URB; it's the submission or completion
code.

> That's what usbtest does, for example ... when it "loses" either
> of the internal races (before HW gets the URB, or after I/O has
> finished but before it's been handed back to the driver) it acts
> differently ... the test can't terminate until a real unlink
> happens (and shouldn't).  It never triggers cases #0/#5; the
> case #2 is how the test exits.

So usbtest in particular needs to know whether an URB really did 
undergo "accelerated" completion.  Can't it tell from the final URB 
status?  Case #2 will be the only way for usb->status to be set to 
-ECONNRESET.

> > If there is, then consider that steps 0, 1, 3, and 4 ought to return
> > distinct error codes, which they currently don't do.  (Or maybe 1
> > should count as a success?)  In practice this would just make more work
> > for drivers, since they would have to test more possible codes.
> 
> Well, surely #0/#5 should return the same fault code.
> Those cases seem unlike the others, more of a "hey driver
> writer, you screwed up" indicator.
> 
> And since #3 and #4 are essentially the same as each other,
> they should return the same code too.  These are the cases
> where the driver didn't screw up; just another minor race
> (it's an async interface after all, races inherent), not any
> kind of screwup.
> 
> If we assume that submit succeeds on #1, so that unlink()
> can succeed, then it behaves like #2.  (I find the other
> option for #1 confusing, so I won't address it.)  And #2
> is a standard "unlink succeeded" outcome.
> 
> That makes three outcomes needed by async unlink:  error
> because of driver screwup; success, accelerated completion
> coming any moment now; fault due to race lost, normal
> completion coming any moment now.

If I take my preferred approach to #1 it would get its own code: race
with submission, submission preempted.  I'm still not sure whether that
should be considered a success, a fault, or an error; since submission
wasn't complete it ought to be the same as #0, but since the submission
was preempted the unlink wasn't a complete failure.  However I guess it
doesn't really matter what we call it.

The return codes from usb_unlink_urb aren't documented, other than the
fact that it returns -EINPROGRESS when it succeeds (which seems like a
pretty strange thing to do).  If you still think a return code is
needed, my inclination is to change it to be as follows:

-EIRDM:         Case 0/5: error, URB was idle.
-ECANCELED:     Case 1: race with submit, submission was preempted.
 0:             Case 2: success, accelerated completion is underway.
-EALREADY:      Case 3/4: fault due to race with hardware completion.

What do you think?  Is the return code really needed?

Alan Stern


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to