On Sun, 7 Dec 2003, Duncan Sands wrote:

> This just came up in devio.c: the "synchronous" call to usb_unlink_urb
> was returning immediately with -EBUSY (probably because the core
> just nuked the urbs) even though the completion handlers hadn't
> finished.  The code assumed (of course - doesn't everybody?) that the
> completion handlers had run, and moved on to do bad things, leading to
> a kernel panic.

Lots of drivers make that assumption.

> In my opinion synchronous usb_unlink_urb should not return until
> completion handlers have been run.  If someone has already started
> unlinking the urb, so much the better: the call should wait until that has
> finished, then return.

That was the essence of my suggested patch.

> PS: The speedtouch driver gets it right (or so I hope), at the expense
> of an ugly block of code in the disconnect routine.

I'm very familiar with only a couple of drivers (usb-skeleton and 
usb-storage) and they both get it right.  class/audio.c looks okay too at 
a hasty check.  But none of the other class drivers I looked it were 
correct.


On Sun, 7 Dec 2003, Oliver Neukum wrote:

> More than that. New submissions of that URB must fail, or the newly
> submitted URB unlinked, if the completion handler has won the race.

My patch does this (submissions made before the synchronous unlink returns 
will fail).


On Sun, 7 Dec 2003, David Brownell wrote:

> But did any kernel ever guarantee those semantics?  I don't think so.
> Certainly 2.6 never did, and neither OHCI nor EHCI on 2.4 does...

I agree, they didn't.  However, that lack didn't prevent driver authors 
from _assuming_ that they did.

> Even today (because of "no uhci_endpoint_disable") drivers that don't
> wait for all their completions in disconnect() will have unique problems.
> That family of problems was worse on 2.4 kernels.
> 
> Drivers now could easily get those cleanup semantics with something
> much smaller than your patch (though hcd_unlink_urb should still   
> get rid of that splice logic!):
> 
>    static inline void urb_begone(struct urb *urb)
>    {
>        while (usb_unlink_urb (urb) == -EBUSY) {
>            set_current_state (TASK_UNINTERRUPTIBLE);
>            schedule_timeout (HZ/100);
>        }
>    }

Whether this is smaller than my patch is a question of interpretation: 
Yours is a much bigger change for the device driver!  It also doesn't 
address the problems of failing resubmission and races between submit and 
unlink.

Yes, I know you think drivers should solve those problems themselves.  
And clearly to a great extent they do.  But they don't all solve all the
problems.  I think the difficulty is that historically the guarantees made
by the synchronous unlink call were not spelled out clearly enough, so a
lot of people misunderstood them.  I did at first.

> I still prefer async unlinks though ... maybe because it pushes
> device drivers to pay closer attention and avoid odd stuff ... 

Agreed.  Ideally, I would like to make usb_unlink_urb() always be 
asynchronous and add usb_unlink_urb_synch() as a new function.  
Unforunately, the majority of unlink calls in existing drivers are of 
the synchronous kind.

> I think an urb_begone() should be relatively doable almost instantly
> as a driver-specific helper.  Likely every driver that tries to use
> synchronous unlink should be audited for disconnect and resubmit, and
> one mark that it's been audited (if not entirely fixed) could be using
> urb_begone() rather than a synchronous usb_unlink_urb().

Again, I agree about the auditing.  It would be a big job, though.  Just
for reference, here's a complete list of all source files under
drivers/usb that use URB_ASYNC_UNLINK:

        misc/auerswald.c, misc/usbtest.c
        net/*.c
        serial/keyspan.c
        storage/tranport.c

Every other driver that does an unlink must necessarily be using 
synchronous unlinking.  That's a lot of them!


> Duncan Sands wrote:
> >
> > This just came up in devio.c: the "synchronous" call to usb_unlink_urb
> > was returning immediately with -EBUSY (probably because the core
> > just nuked the urbs) even though the completion handlers hadn't
> > finished.
>
> That was with uhci-hcd, right?  To support Greg's wish to support
> a "fire-and-forget" urb management policy, I think we really need
> to have a uhci_endpoint_disable() that's waiting for those urbs.

On Mon, 8 Dec 2003, Duncan Sands wrote:

> yes it was uhci.  You're pointing out that the uhci implementation of
> urb nuking is not complete, right?  I'd forgotten about that, and indeed
> it would explain how this panic was able to occur.

When the code freeze ends, I will submit my version of such a routine.  
But as far as I can see, there is _no_ statement in the kernel source
about what struct usb_operations->disable() is supposed to do.  The
comments in front of one implementation in hcd.c (hcd_endpoint_disable)
merely state that all endpoint state is gone from the hardware.  Since 
UHCI doesn't store any endpoint state in the hardware to begin with, 
that doesn't say anything.  _Nothing_ is mentioned anywhere about 
waiting until all the software/driver state is gone as well.


On Mon, 8 Dec 2003, Duncan Sands wrote:
 
> better to be explicit I think: provide a routine for halting the endpoint
> (I guess this exists already).  Then people who resubmit in the completion
> handler can do:
> 
> halt the endpoint (all submissions to the endpoint will fail).
> unlink the urb (synchronously)
> restart the endpoint

Restart wouldn't be necessary if this was being done as part of a 
diconnect routine.  The core will take care of it automatically.

> If I understand right, you are proposing to bundle the "halt the endpoint"
> part into usb_unlink_urb (sync)?

On Mon, 8 Dec 2003, Oliver Neukum wrote:

> No, it seems to me he wants a "halt the urb" function.

Oliver is right.  My suggestion merely prevents the URB being unlinked
from being resubmitted.  It would not prevent a different URB from being
submitted to the same endpoint.  I think that's a sensible approach.  
People do sometimes want to perform a synchronous unlink at times other
than disconnect.


> But that's not really an issue. All you need to do is to spin on a running
> completion handler. After that either the completion handler has run,
> which means the assumption drivers make is safe to make, or you
> can unlink the newly submitted urb.
> Not quite so trivial to implement without races, but there are _no_
> changes to existing semantics.

You're correct that this can be done by writing the driver appropriately.  
My point is that a lot of drivers would need to be changed, and it's
easier to change the core and alter the semantics of the API.


On Mon, 8 Dec 2003, Duncan Sands wrote:

[referring to David Brownell's proposed urb_begone()]:
> why not have usb_unlink_urb (sync) do this?

If the urb was resubmitted by the completion handler, urb_begone() as
written would loop endlessly.


On Mon, 8 Dec 2003, David Brownell wrote:

> But if usbcore were to get changed, why not change it
> to have a usb_urb_begone() call that MUST (eventually)
> be used for all synchronous unlinks?  That's a better 
> long-term approach IMO.

I like that idea.  In fact, why not make usb_unlink_urb() be purely async?  
Drivers that want to wait for the URB to finish completely could then call
usb_urb_begone() after unlinking it.  (The implementation could perhaps be
spruced up a little.  Something like my usage_count field should be used;
there's already a FIXME about improving the URB state in hcd.c.  And it 
wouldn't hurt to use a blocking primitive rather than polling every 
1/100th second.  There could even be an interruptible version, although 
I don't know that there's any need for such a thing.)

> Thing is, I really do think that every device driver
> using synchronous unlink models should get checked  
> to make sure it's not got other disconnect() bugs   
> lurking; they've been a longstanding problem.  And at
> least some  checking would be done as part of making 
> synchronous unlink calls use a different function.   

I think every device driver, period, should be audited.  Not just the ones 
that do synchronous unlink, although that's a particularly fertile source 
of errors.  But looking through driver sources I have seen a lot of bad 
code.


On Mon, 8 Dec 2003, Oliver Neukum wrote:

> There needs to be synchronisation, but the core can provide it or at least
> most of it. You just need to make ordinary completion and usb_unlink_urb()
> atomic to each other. In this case a simple conditional on the urb's status
> is sufficient synchronisation.

Another important area where better synchronization is needed in the core 
is submission/unlinking.  An URB should be either:

        idle (not used at all or just beginning the submission process
                but not yet linked by the HCD),
        in progress (linked by the HCD, no errors encountered yet, not
                unlinked, not completed),
        or finishing up (error, unlink, or in completion handler).

These state changes should be protected by urb->lock.  But they aren't, 
and we currently can't distinguish between idle and finishing up.

My proposal would make thise categories reliably detectable:

        idle:           urb->usage_count == 0,
        in progress:    urb->usage_count > 0 and urb->status = -EINPROGRESS,
        finishing up:   urb->usage_count > 0 and
                                urb->status != -EINPROGRESS.


All right.  I was hoping to stir up a little discussion, and apparently I 
succeeded.  It would be nice to arrive at a consensus regarding a few key 
points:

        Should the core change to prevent resubmission of URBs under
        certain circumstances?

        Should URB state tracking be more reliable?

        Should we try to fix the existing synchronous unlinking problem
        by auditing drivers or by altering the core to make it match more
        closely the drivers' preconceptions?  Should we change them both 
        a little?

Alan Stern



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to