On Thursday 14 June 2007, Alan Stern wrote:
> On Thu, 14 Jun 2007, David Brownell wrote:
> > And before that hardware status is detected, we're still in
> > the "after step #2" case.  It's unfortunate that sometimes
> > that means discarding useful hardware status reports... since
> > the unlink() status would override hardware status which might
> > not yet have been collected.  (Because for example maybe the
> > relevant IRQ was being held off by locking during the unlink.)
> 
> The unlink status doesn't have to override hardware status reports, not 
> even ones which haven't yet been collected.  If there is a relevant IRQ 
> pending, the giveback must not occur until the IRQ has been serviced -- 
> in which case the status from the hardware would be available in time 
> to override the unlink status.

That implies you're removing unlink() return codes, yes?  Because
if it doesn't override status that's not yet been reported, then
you would be wrongly reporting success for the unlink() ... since
the URB wouldn't report itself as unlinked in complete(), but the
unlink would have reported success.


> > > In particular, in 1 can we say that the URB was unlinked when  
> > > strictly speaking it never got fully submitted to begin with?
> > 
> > If the submit() succeeds, then there are only two ways for
> > that URB to complete:  (a) normally due to I/O completion,
> > fault or normal; and (b) unlink, which is a fault triggered
> > by the software.
> > 
> > As I noted above, letting the unlink() prevent the submit()
> > from succeeding seems like a pretty dubious model.  But it
> > might have become less surprising due to the way kill() can
> > now change the URB lifecycle.
> 
> The reason for using the "both fail" approach is that it's much 
> simpler.  A single test can fail the submit right at the beginning, 
> with no need for a spurious code path which simulates a submit 
> followed by an immediate giveback.  There's less to go wrong and it 
> runs more quickly.

I guess I don't follow that argument.  The issue here is that
the queue is partly managed by usbcore, so there's a window
between usbcore marking the URB as queued and then the HCD
sticking it onto the hardware queue.

So long as that window exists, unlink requests can sneak into
that window.  There's no "simulation" of a submit; the code
path is the normal one, not a "spurious" one.

At this point it might be possible to refactor things though...

With the CRIS HCD gone, the remaining legacy HCD infrastructure
can finally be removed.  We *could* move responsibility for
the per-endpoint queue entirely into HCDs, except for the head
of the unlink() code path ... exposing the per-HCD spinlock
used to protect that queue (and related stuff).  And that's
all without giving up the advantages of knowing where each
endpoint's queue is located, so usbcore can look at it etc.

That would let us get rid of case #1 entirely, including its
messiness for unlinking ... also get rid of corresponding
oddness on the giveback() paths.  A net cleanup, which would
unfortunately affect every HCD (including ones which have
not yet merged upstream, sigh).


> One way to reduce drivers' confusion would be to have the submit call
> for the pre-empted URB return -EPERM.  That's what usb_kill_urb would 
> normally cause to happen and so many drivers are accustomed to it.
> 
> Drivers calling usb_unlink_urb from a different thread, not 
> synchronized with URB submission, ought to expect strange things to 
> happen occasionally.  If documented properly it should be acceptable.

True enough.  Now that kill() has that EPERM path, unlink() can
do the same thing if that makes anything simpler.  However,
adding more fault paths seems to me like the wrong direction.


> > That's what usbtest does, for example ... when it "loses" either
> > of the internal races (before HW gets the URB, or after I/O has
> > finished but before it's been handed back to the driver) it acts
> > differently ... the test can't terminate until a real unlink
> > happens (and shouldn't).  It never triggers cases #0/#5; the
> > case #2 is how the test exits.
> 
> So usbtest in particular needs to know whether an URB really did 
> undergo "accelerated" completion.  Can't it tell from the final URB 
> status?  Case #2 will be the only way for usb->status to be set to 
> -ECONNRESET.

What matters here is testability of the interface.  What are you
proposing here, exactly, for the test?

Remember that the reason for that loop resubmitting-in-completion
is that it's hard to trigger unlinks.  And those unlink paths are
in serious need of testing, since they don't happen all that often,
and historically are *very* bug-prone parts of HCDs.


> > That makes three outcomes needed by async unlink:  error
> > because of driver screwup; success, accelerated completion
> > coming any moment now; fault due to race lost, normal
> > completion coming any moment now.
> 
> If I take my preferred approach to #1 it would get its own code: race
> with submission, submission preempted.  I'm still not sure whether that
> should be considered a success, a fault, or an error; since submission
> wasn't complete it ought to be the same as #0, but since the submission
> was preempted the unlink wasn't a complete failure.  However I guess it
> doesn't really matter what we call it.

It would be a second "fault due to race lost" case.  Arguably
some test case should make sure each of the non-error paths
happens.


> The return codes from usb_unlink_urb aren't documented, other than the
> fact that it returns -EINPROGRESS when it succeeds (which seems like a
> pretty strange thing to do).

Yeah, interfaces evolving over time will accumulate oddities
like that one.  ISTR that was at one point the way to tell
async unlinks apart from synchronous ones.

Agreed that the return codes should be documented.  And that
making that one be more regular would be good.


> If you still think a return code is 
> needed, my inclination is to change it to be as follows:
> 
> -EIRDM:       Case 0/5: error, URB was idle.
> -ECANCELED:   Case 1: race with submit, submission was preempted.
>  0:           Case 2: success, accelerated completion is underway.
> -EALREADY:    Case 3/4: fault due to race with hardware completion.
> 
> What do you think?  Is the return code really needed?

Well, I'm not keen on adding the -ECANCELED (not -EPERM for symmetry?)
case, but if you want to add it, then go ahead.  And EIDRM is odd;
that seems more like an EINVAL kind of thing.

This all gets down to testability, in my book.  It's clear to me
how we can test the interface if unlink() returns fault codes.
But if it doesn't, it's not at all clear.  And I'm not keen on
giving up testability for a primitive that has caused so much
trouble (oopsing etc) over time, and which is already tricky to
get right inside the HCDs.

- Dave



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to