Re: [linux-usb-devel] [RFC] Eliminate usb_unlink_urb()'s return code

David Brownell Fri, 15 Jun 2007 10:33:58 -0700

On Thursday 14 June 2007, Alan Stern wrote:
> On Thu, 14 Jun 2007, David Brownell wrote:
> 
> > > The unlink status doesn't have to override hardware status reports, not 
> > > even ones which haven't yet been collected.  If there is a relevant IRQ 
> > > pending, the giveback must not occur until the IRQ has been serviced -- 
> > > in which case the status from the hardware would be available in time 
> > > to override the unlink status.
> > 
> > That implies you're removing unlink() return codes, yes?  Because
> > if it doesn't override status that's not yet been reported, then
> > you would be wrongly reporting success for the unlink() ... since
> > the URB wouldn't report itself as unlinked in complete(), but the
> > unlink would have reported success.
> 
> I should have been more clear about this...  Yes, there is a
> possibility that an unlink call can apparently "succeed" and yet the
> hardware could complete the URB normally before the "accelerated"  
> completion kicks in.


That's the race I pointed out.  Resolving it in favor of
providing the hardware status reports would change the
unlink() call semantics ... successful unlink would no
longer be visible in urb status during complete().


> In the spirit of giving the driver all the 
> information available from the hardware, I think that the URB's final
> status should reflect the normal completion (i.e., should be 0 or a
> hardware/protocol error code, not -ECONNRESET or -ENOENT).

That's the change in semantics... usb_unlink_urb() is now
defined to work differently (i.e. as I described it).


> At unlink time there's no way to know if this will happen.

Not true.  It's trivial:  only return success from the unlink()
if the urb status could be changed from -EINPROGRESS.  And no
code in the HCD changes status that's already -EINPROGRESS.

That's how all code should currently be working.  I know it's
how OHCI and EHCI have worked for many years ... and I just
checked the usbcore unlink glue, and I see it's doing its part
of that:  returning -EBUSY if the HCD is in the process of
returning the URB, but hasn't yet called giveback().


> Whether 
> reporting success for the unlink would be "wrong" is a matter of
> judgment; after all, the unlink did what it was supposed to and the URB
> completed promptly.

Not "judgement"; specification.  If unlink succeeds, the urb status
will always reflect that.  (Otherwise it gets hardware status).

Unlinking isn't only about completing ASAP; it's also about knowing
that particular completion path was triggered.


> Neither alternative is very satisfying...  Which reminds me, ehci-hcd
> doesn't try very hard (if at all!) to accelerate the completion of
> unlinked Iso transfers.  So shouldn't it return an error code for the 
> unlink attempt?

It returns as fast as it can.  How is that not accelerated?  :)

I don't see much point to returning a failure code there.  The
issue is that cleaning up in those cases isn't as simple as for
transfers that have a QH.  Every ISO TD would need to be disabled
and then the schedule would need to be cleaned up.  It's just that
nobody has implemented such stuff.


> > So long as that window exists, unlink requests can sneak into
> > that window.  There's no "simulation" of a submit; the code
> > path is the normal one, not a "spurious" one.
> 
> (Nitpicking: No, there really is a separate code path in many of the 
> HCDs.  For example, in sl811h_urb_enqueue() there is:

That's the normal code path.  It has a special case, yes,
but that's just one of many.


>       /* in case of unlink-during-submit */
>       spin_lock(&urb->lock);
>       if (urb->status != -EINPROGRESS) {
>               spin_unlock(&urb->lock);
>               finish_request(sl811, ep, urb, 0);
>               retval = 0;
>               goto fail;
>       }
> 
> I don't know very much about that driver, but at first glance it seems
> possible that this path doesn't stop the endpoint queue -- which would
> be a bug.)

That driver is prototypical in that the only endpoint queue is the one
maintained by usbcore.  There's nothing like a QH for the hardware to
maintain ... other controller drivers work that way, but they've not
gotten to the main kernel tree yet.  That's correct as written.

 
> > At this point it might be possible to refactor things though...
> > 
> > With the CRIS HCD gone, the remaining legacy HCD infrastructure
> > can finally be removed.
> 
> Hah!  You're talking about stuff from before I started working on 
> usbcore.  What parts of the HCD infrastructure count as "legacy" 
> nowadays?

Design assumptions, many of which are getting cleaned up
incrementally.  (Like the one which meant usbcore couldn't rely
on knowing where the URB queue lived; or the one which causes
async unlinks to return -EINPROGRESS not zero for success)  The
split between usb_bus and usb_hcd.  The bandwidth reservation
stuff.  And surely a few other things.


> (And speaking of bizarre HCDs, what's the story on u132-hcd.c?  Is it 
> likely ever to be cleaned up?)

Dammifino.  It doesn't seem to be maintained, despite being
quite new.

 
> >  We *could* move responsibility for
> > the per-endpoint queue entirely into HCDs, except for the head
> > of the unlink() code path ... exposing the per-HCD spinlock
> > used to protect that queue (and related stuff).
> 
> Yes.  Note that hcd_data_lock isn't per-HCD, even though it ought to
> be.  If it were then we wouldn't have to expose anything.

Doesn't "need" to be, as it's now used; the normal rule of
thumb is that locks should only be fine grained if making
them be that way reduces contention.

But if that responsibility were moved, that lock would need
to be replaced by a per-HCD lock.


> >  And that's
> > all without giving up the advantages of knowing where each
> > endpoint's queue is located, so usbcore can look at it etc.
> > 
> > That would let us get rid of case #1 entirely, including its
> > messiness for unlinking ... also get rid of corresponding
> > oddness on the giveback() paths.  A net cleanup, which would
> > unfortunately affect every HCD (including ones which have
> > not yet merged upstream, sigh).
> 
> Well, the stuff I'm working on affects all the HCDs anyway.  This would 
> just be more of the same.  On the whole I think it's a good thing to 
> do.  It would remove cases 1 and 4, rendering much of our discussion 
> moot.

It'd be a lot more work too.  ;)

But yes, I mentioned it because it seemed likely to be a good
idea.  I was never happy with that split, but it was needed
as an interim measure.


> > > So usbtest in particular needs to know whether an URB really did 
> > > undergo "accelerated" completion.  Can't it tell from the final URB 
> > > status?  Case #2 will be the only way for usb->status to be set to 
> > > -ECONNRESET.
> > 
> > What matters here is testability of the interface.  What are you
> > proposing here, exactly, for the test?
> > 
> > Remember that the reason for that loop resubmitting-in-completion
> > is that it's hard to trigger unlinks.  And those unlink paths are
> > in serious need of testing, since they don't happen all that often,
> > and historically are *very* bug-prone parts of HCDs.
> 
> You're talking about unlink1() in usbtest, right?

Yes.


> The values it tests 
> for are -EBUSY and -EIDRM -- slightly odd since -EINPROGRESS is the
> only documented value for usb_unlink_urb.

I think the docs didn't catch up to the code in that case.
(One of many such cases...)  Both of those codes are returned
by usbcore on the unlink paths.

 
> I would make it keep calling usb_unlink_urb() repeatedly (with
> msleep(1) in between iterations) until the completion routine told it
> that either a resubmit failed or urb->status == -ECONNRESET.  Put a cap
> on the number of iterations (e.g., 5) to tell when the test fails.

But hitting that cap wouldn't indicate failure, it would
just indicate lack of patience ... that would produce loads
of false failures.

Why bother with the msleep()?


> > > If I take my preferred approach to #1 it would get its own code: race
> > > with submission, submission preempted.  I'm still not sure whether that
> > > should be considered a success, a fault, or an error; since submission
> > > wasn't complete it ought to be the same as #0, but since the submission
> > > was preempted the unlink wasn't a complete failure.  However I guess it
> > > doesn't really matter what we call it.
> > 
> > It would be a second "fault due to race lost" case.  Arguably
> > some test case should make sure each of the non-error paths
> > happens.
> 
> Hmm.  It's very hard to provoke and test races without adding
> artificial delays...

Granted the race exists, just running the test a LOT of times
will produce it eventually.

 
> > > If you still think a return code is 
> > > needed, my inclination is to change it to be as follows:
> > > 
> > > -EIRDM:       Case 0/5: error, URB was idle.
> > > -ECANCELED:   Case 1: race with submit, submission was preempted.
> > >  0:           Case 2: success, accelerated completion is underway.
> > > -EALREADY:    Case 3/4: fault due to race with hardware completion.
> > > 
> > > What do you think?  Is the return code really needed?
> > 
> > Well, I'm not keen on adding the -ECANCELED (not -EPERM for symmetry?)
> > case, but if you want to add it, then go ahead.  And EIDRM is odd;
> > that seems more like an EINVAL kind of thing.
> 
> We won't need -ECANCELED if we take your suggestion regarding the HCD
> infrastructure. 

Ergo said fiendish suggestion.  :)


> And as discussed above, the -EALREADY cases aren't 
> reliably detectable at unlink time.

Today they'd return -EBUSY ... see hcd_unlink_urb().
Once the HCD maps hardware status into URB status,
the unlink code refuses to report success, since the
URB is already in the process of being unlinked. 

 
> As for -EIDRM -- I looked through the lists of error codes, and there
> didn't seem to be anything along the lines of -ENOTBUSY!  But -EINVAL
> will do just as well.

Or, we can just leave it the way it is today:  EIDRM is more
informative than EINVAL, since usbcore only returns it in
that one place (unlinking an URB that's not currently linked).
Ignore my comment about that one.  ;)


> > This all gets down to testability, in my book.  It's clear to me
> > how we can test the interface if unlink() returns fault codes.
> > But if it doesn't, it's not at all clear.  And I'm not keen on
> > giving up testability for a primitive that has caused so much
> > trouble (oopsing etc) over time, and which is already tricky to
> > get right inside the HCDs.
> 
> To be really thorough, the test would have to correlate the code from 
> unlink() with the value of urb->status.  With proper synchronization 
> between the completion handler and the main routine, this should be 
> possible.

You mean, like the code now found in the unlink1() test, which
verifies that the async unlink always returns -ECONNRESET?


> That still leaves the question of how to test all the various race 
> outcomes.  I can't think of any good way to do it.

This is where probabalistic testing comes in.  Luckily it's not
something that needs to be run so much any more, now that we seem
to have squished most the bugs at that level into technicolor goo.


> So what do you think?  Go ahead and move the endpoint-queue stuff into 
> the HCDs?

If you're going to volunteer to do that, I think that would be
a Fine Thing.  :)

- Dave


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
[email protected]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Re: [linux-usb-devel] [RFC] Eliminate usb_unlink_urb()'s return code

Reply via email to