Re: [linux-usb-devel] [RFC] Eliminate usb_unlink_urb()'s return code

Alan Stern Fri, 15 Jun 2007 14:30:00 -0700

On Fri, 15 Jun 2007, David Brownell wrote:

> > I should have been more clear about this...  Yes, there is a
> > possibility that an unlink call can apparently "succeed" and yet the
> > hardware could complete the URB normally before the "accelerated"  
> > completion kicks in. 
> 
> That's the race I pointed out.  Resolving it in favor of
> providing the hardware status reports would change the
> unlink() call semantics ... successful unlink would no
> longer be visible in urb status during complete().


But would it in fact be a successful unlink?  The _only_ difference in
behavior (at both the driver level and the hardware level) is in the
URB's completion status.  And if we resolve the race in favor of the
hardware status then even that distinction would be gone.

I know I'm going to regret this, but I propose that the API be changed
so that an URB's final status be set to -ECONNRESET or -ENOENT if and
only if it was terminated early by an unlink request; in other words,
only if it really was cancelled at the hardware level.  As a corollary
of this change, the return value from usb_unlink_urb would indicate
only whether the HCD accepted the unlink request, not whether the
request caused the URB to terminate with an error status.  This leads
to the possibility of an apparently successful unlink followed by a
successful URB completion; I think this is acceptable.

> That's the change in semantics... usb_unlink_urb() is now
> defined to work differently (i.e. as I described it).

Yes, this is a change.  We can afford to make it because no drivers pay
close attention to usb_unlink_urb's return value except for usbtest.  
In fact, only about five drivers pay any attention to it at all.


> > At unlink time there's no way to know if this will happen.
> 
> Not true.  It's trivial:  only return success from the unlink()
> if the urb status could be changed from -EINPROGRESS.  And no
> code in the HCD changes status that's already -EINPROGRESS.

A misunderstanding -- I meant that at unlink time there's no way to
know if the URB will complete normally before the HCD can take it away
from the hardware.

> That's how all code should currently be working.  I know it's
> how OHCI and EHCI have worked for many years ... and I just
> checked the usbcore unlink glue, and I see it's doing its part
> of that:  returning -EBUSY if the HCD is in the process of
> returning the URB, but hasn't yet called giveback().

I think that _is_ how all the code currently works -- and I think all
the code should be changed.

> > Whether 
> > reporting success for the unlink would be "wrong" is a matter of
> > judgment; after all, the unlink did what it was supposed to and the URB
> > completed promptly.
> 
> Not "judgement"; specification.  If unlink succeeds, the urb status
> will always reflect that.  (Otherwise it gets hardware status).
> 
> Unlinking isn't only about completing ASAP; it's also about knowing
> that particular completion path was triggered.

I'd agree with you if any drivers besides usbtest actually used that
information.  In my experience, async unlinking _is_ only about
completing ASAP.  That was part of the reason for writing usb_kill_urb.

> > Neither alternative is very satisfying...  Which reminds me, ehci-hcd
> > doesn't try very hard (if at all!) to accelerate the completion of
> > unlinked Iso transfers.  So shouldn't it return an error code for the 
> > unlink attempt?
> 
> It returns as fast as it can.  How is that not accelerated?  :)
> 
> I don't see much point to returning a failure code there.  The
> issue is that cleaning up in those cases isn't as simple as for
> transfers that have a QH.  Every ISO TD would need to be disabled
> and then the schedule would need to be cleaned up.  It's just that
> nobody has implemented such stuff.

It's easy to believe that cleaning up would be complex.  uhci-hcd does 
it, however.  (Not that I intend to write analogous code for ehci-hcd!)


> >     /* in case of unlink-during-submit */
> >     spin_lock(&urb->lock);
> >     if (urb->status != -EINPROGRESS) {
> >             spin_unlock(&urb->lock);
> >             finish_request(sl811, ep, urb, 0);
> >             retval = 0;
> >             goto fail;
> >     }
> > 
> > I don't know very much about that driver, but at first glance it seems
> > possible that this path doesn't stop the endpoint queue -- which would
> > be a bug.)
> 
> That driver is prototypical in that the only endpoint queue is the one
> maintained by usbcore.  There's nothing like a QH for the hardware to
> maintain ... other controller drivers work that way, but they've not
> gotten to the main kernel tree yet.  That's correct as written.

I had to think about this...  The driver is relying on the fact that 
interrupts are disabled during giveback, right?  So the endpoint queue 
_can't_ advance.

But consider this pathological case.  An URB is submitted to an empty
queue and unlinked before the submission is complete.  The completion
handler then submits a second URB.  Won't that URB start executing on
the hardware before the completion handler returns?


> > (And speaking of bizarre HCDs, what's the story on u132-hcd.c?  Is it 
> > likely ever to be cleaned up?)
> 
> Dammifino.  It doesn't seem to be maintained, despite being
> quite new.

It's an amazing source file, full of style violations, repeated code 
sequences, unprotected data accesses, and probably much more...


> > Well, the stuff I'm working on affects all the HCDs anyway.  This would 
> > just be more of the same.  On the whole I think it's a good thing to 
> > do.  It would remove cases 1 and 4, rendering much of our discussion 
> > moot.
> 
> It'd be a lot more work too.  ;)

Not that much more.  It would boil down to adding a couple of 
subroutine calls to each enqueue routine and a subroutine call to the
finish_request analog.  (Except for u132, which calls giveback_urb in 
multiple places...)


> > I would make it keep calling usb_unlink_urb() repeatedly (with
> > msleep(1) in between iterations) until the completion routine told it
> > that either a resubmit failed or urb->status == -ECONNRESET.  Put a cap
> > on the number of iterations (e.g., 5) to tell when the test fails.
> 
> But hitting that cap wouldn't indicate failure, it would
> just indicate lack of patience ... that would produce loads
> of false failures.

Figure the odds.  How long does it take the test URB to complete
compared to the amount of time needed for resubmitting?  If the test
URB transfers enough data (say more than 1 ms worth) then we can be
nearly certain of hitting a time when the URB is still in progress.  
If 5 iterations isn't enough use 10 instead.

> Why bother with the msleep()?

For SMP systems; it's not needed on UP.  Without it you'd be in a
busy-loop, trying over and over again to unlink the URB.  If you just
happened to hit the wrong time then you'd rack up 5 or 10 unlink
attempts very quickly without giving the resubmit pathway enough time
to run.

> > To be really thorough, the test would have to correlate the code from 
> > unlink() with the value of urb->status.  With proper synchronization 
> > between the completion handler and the main routine, this should be 
> > possible.
> 
> You mean, like the code now found in the unlink1() test, which
> verifies that the async unlink always returns -ECONNRESET?

Not exactly.  The current code can do multiple unlinks and multiple 
submits; it doesn't check that -ECONNRESET is the status of the URB 
that was in flight at the time of the successful unlink.  It only knows 
that _one_ of the unlinks succeeded and _one_ of the URBs got 
-ECONNRESET.


> > > > If you still think a return code is 
> > > > needed, my inclination is to change it to be as follows:
> > > > 
> > > > -EIRDM:       Case 0/5: error, URB was idle.
> > > > -ECANCELED:   Case 1: race with submit, submission was preempted.
> > > >  0:           Case 2: success, accelerated completion is underway.
> > > > -EALREADY:    Case 3/4: fault due to race with hardware completion.
> > > > 
> > > > What do you think?  Is the return code really needed?
> > > 
> > > Well, I'm not keen on adding the -ECANCELED (not -EPERM for symmetry?)
> > > case, but if you want to add it, then go ahead.  And EIDRM is odd;
> > > that seems more like an EINVAL kind of thing.
> > 
> > We won't need -ECANCELED if we take your suggestion regarding the HCD
> > infrastructure. 
> 
> Ergo said fiendish suggestion.  :)

Come to think of it, there is one situation where -ECANCELED might be 
appropriate: if the URB was already unlinked.  You'd probably call that 
-EBUSY, though.  But see below...

> > And as discussed above, the -EALREADY cases aren't 
> > reliably detectable at unlink time.
> 
> Today they'd return -EBUSY ... see hcd_unlink_urb().
> Once the HCD maps hardware status into URB status,
> the unlink code refuses to report success, since the
> URB is already in the process of being unlinked. 

After spending a fair amount of time on reducing the reliance on 
urb->status, I realized that we ought to be able to eliminate it almost 
entirely.  Here's what I mean:

In the absence of an unlink, HCDs should not need to store anything in
urb->status until they call giveback_urb.  Right now they _do_ store
error codes there, but AFAIK there's no reason why an URB shouldn't be
given back at the time the error is discovered -- which means the HCD
would be able to keep the error status in a local variable rather than
storing it in the URB.  (Iso URBs may present a minor problem, since
errors don't force them to complete early.)

Doing this would require some reorganization of the HCDs.  The ones 
which could benefit from it the most are the ones which don't allocate 
an urb_priv structure -- which unfortunately are the ones I'm least 
familiar with.  For example, if I read the isp116x code correctly, it 
determines the statuses of all completed URBs for an endpoint before 
giving any of them back.  But there's no apparent reason why it 
couldn't give them back one by one, as the status is determined.

Now this would not allow us to do away with urb->status entirely.  It 
is still needed for storing unlink error codes, since URBs cannot be 
given back immediately when they are unlinked.  It's also needed by 
usbcore, because endpoint_disable has to know which URBs have already 
been cancelled.

Still, it's better to have a field just for storing unlink codes than
one for overall status.  In addition we won't need urb->lock any more, 
since there won't be any contention between usbcore and the HCD for 
setting the status.  It's an overall win -- but I don't know enough 
about the drivers to do it.  What's your advice?


> > So what do you think?  Go ahead and move the endpoint-queue stuff into 
> > the HCDs?
> 
> If you're going to volunteer to do that, I think that would be
> a Fine Thing.  :)

Good.

Alan Stern


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Re: [linux-usb-devel] [RFC] Eliminate usb_unlink_urb()'s return code

Reply via email to