On Mon, May 13, 2002 at 12:56:23PM -0700, David Brownell wrote:
> > > > But I'll correct a few misstatements right now about the
> > > > current 2.5 behavior.  (And for what it's worth, only "uhci.c"
> > > > has any problems with that -- from what I've seen.)
> > > > 
> > > > > - It made the USB code more complicated as a whole with the introduction
> > > > >   of an additional cleanup path for devices. 
> > > > 
> > > > Not so.  It has detected/reported failures to correctly support
> > > > the _original_ cleanup paths, though.  For example, device
> > > > drivers that hang on to references after their disconnect()
> > > > methods complete.  (Which can make "rmmod" break, for
> > > > example completion callbacks going to where the module
> > > > used to be loaded.)
> > > 
> > > Hm, we should probably fix that.  We should test for that in the usb
> > > core before passing the call down to the host controller driver.
> 
> Simple fix:  don't merge Johannes' changes.
> 
> 
> > That's a device driver bug. If it's still hanging onto references and
> > the module is removed, it's a bug.
> > 
> > That has nothing to do with the core, or HCD's.

No, I think this should be allowed.  If a driver has a reference, it
should not just disappear underneath it.  Yes, you could argue that the
driver is notified of when it disappears, but this is a place that can
be _very_ prone to nasty race conditions.  See the torrid history of the
visor.c driver for a good example of how long it has taken to finally (I
hope) get this logic right!  I do not want any other driver developer to
have to go through that kind of hell...

And yes, I don't think the device driver should be able to successfully
_do_ anything with that reference (like submit an urb, etc.), but the
core can be modified to nicely return an error, so the driver can
successfully recover.

I _much_ prefer doing a little extra work in the core code, than forcing
_all_ device drivers to duplicate this logic.  Again, see the visor
driver for an example.

But this can be added in the (near) future, it isn't stopping anything
right now.

> Johannes, your arguments have boiled down to not wanting
> the core to detect or prevent such oopses.  I really don't
> understand why, since the cost to correctly written device
> drivers is zero and the _value_ of detecting/preventing such
> problems is significantly higher.  (Measurable in years that
> such bugs have otherwise been creating random problems
> before someone finally figures out what's wrong.  Consider
> cases like "printer" and "usbfs".)

printer and usbfs has not had usb_dev problems in the past, right?
I want the core to prevent oopses, and I'm pretty sure these two patches
now do this, correct?

With the exception that if a device driver grabs a usb_get_dev()
reference, and doesn't release it properly, but we've always had this
problem :)

> > > So should we add a BUG() call if_interrupt() is true when we go to
> > > clean up the finally usb_free_dev() call to catch for this?
> 
> Again, simple fix:  don't merge Johannes' changes.

Sorry, I'm going to rule no.  Unless a bug in the existing code can be
found (and then I'm just probably going to fix it :)

> > NO, NO, NO. usb_free_dev() is called when the last reference is
> > decremented, which can be in ANY context!
> > 
> > But it doesn't matter since usb_free_dev doesn't do any blocking.
> 
> The HCD is most _certainly_ allowed to block when cleaning up
> device state.  Every device management API I've had reason to
> look at in Linux guarantees that the the "clean up after this device"
> call can block.
> 
> I don't care at all about the memory management, except that as I
> said elsewhere, I can't see any case where the device memory
> should reasonably be kept around after the cleanup call is made.

Ok, a BUG() call if in_interrupt() is ok with you?  I don't see what it
would hurt (this isn't exactly speed sensitive code.)

> > > > One thing to keep in mind is that to the extent that many
> > > > device drivers _today_ are working "correctly", they're not
> > > > going to trigger such bugs.  However, Johannes' change
> > > > makes the kernel more fragile in the face of newer drivers
> > > > which do have bugs in such areas.
> > > 
> > > If we add the BUG() mentioned above, how are things more fragile?
> > > Personally, they look more stable to me.
> 
> Today (2.5.15) the BUG() test is being made.  That's how I can
> say that they're not going to trigger such bugs, and how overall
> stability is better today.

No, 2.5.15 does not allow the host controller driver to free it's
reference when it wants to, hence the uhci.c problem.  If I had been
more thorough when you had originally submitted this patch and tested
with uhci.c, I would not have accepted it as such.  Very sorry about
this.

> And the ONLY lack of stability is for folk using "uhci.c" ... or for
> new drivers that haven't gotten rid of disconnect() bugs yet. Which
> IMO is a good sign, particularly since the updates to "uhci.c" are
> so simple, and since the new device drivers will get nice clean
> BUG() reports if they screw up.

No, I want the model to be easier, so that we don't have these kinds of
arguments before.  Much like the "is it ok to free an urb right now"
kind of mess we had before I added proper reference counting for them.
See, no more argument, or worries about doing it at a "wrong" time.
Everything just works fine (and you get to do fun things like the 2.5
version of visor.c shows :)

> > Just code the drivers correctly.
> > 
> > The kernel is not the place to coddle coders who cannot code correctly.
> > It's a dangerous place and it'll remain so for a long time.
> 
> Likewise, the kernel is not the place to design-in fragility for the
> component interconnections.  (Fragility like hiding errors.)
> 
> It's a place where good coding discipline -- like defining and preserving
> invariants of all kinds, like the BUG() tests you object to -- is well
> rewarded in terms of system stability.

It's also the place that _NEEDS_ good reference counting for structures
that are handled by multiple threads of execution.  We now have this,
with almost no special cases (disconnect() being the current one, but
that's always existed, I'll fix it...)  This causes cleaner, more robust
code.

Again, I don't see a difference in this from the usb_get_urb()
usb_put_urb() arguments.  And since that code has been in, I haven't
heard any problems or discussions that used to crop up quite frequently
here.

thanks,

greg k-h

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to