On Fri, 13 Sep 2013, Bjorn Helgaas wrote:
> Thanks. Maybe this is more relevant than I thought. I'd sure like to
> copy your strategy rather than reinvent something.
Well, I don't know if this will really end up being all that relevant
for PCI, but you're welcome to steal any ideas you like. :-)
> > Very briefly, usb_device_read() dereferences bus->root_hub:
> >
> > usb_lock_device(bus->root_hub);
> >
> > On the other hand, usb_remove_hcd() calls both
> >
> > usb_disconnect(&rhdev);
> >
> > and
> >
> > usb_put_dev(hcd->self.root_hub);
> >
> > (note that rhdev == hcd->self.root_hub == bus->root_hub), which
> > unregisters and deallocates the root_hub structure. Something has to
> > prevent usb_device_read() from accessing the structure after it has
> > been unregistered and freed; that's where usb_bus_list_lock comes in.
>
> I hadn't paid attention to hcd->rh_registered before, but that seems
> important. When iterating over usb_bus_list, usb_device_read()
> ignores buses with hcd->rh_registered == 0.
Right. In theory this could happen either because the root hub hasn't
been registered yet or it has already been (or is about to be)
unregistered.
> usb_remove_hcd() clears hcd->rh_registered before deallocating the
> root_hub with usb_put_dev(). I think the mutex_lock/unlock between
> clearing rh_registered and deallocating root_hub will ensure that
> usb_device_read() cannot see rh_registered == 1 and a deallocated
> root_hub structure.
Exactly. You've got it.
> But that seems more like a memory barrier issue
> than a locking issue, and I don't yet see how usb_disconnect() is
> related.
It's more than just a memory barrier. Suppose usb_device_read() sees
that rh_registered is set. Then we don't want usb_remove_hcd() to
deallocate the root hub until usb_device_read() is finished using it,
and the way to prevent that is by making usb_device_read() hold
usb_bus_list_lock somewhere in between setting rh_registered back to 0
and the final usb_device_put().
The fact that this "somewhere" surrounds a call to usb_disconnect() is
more or less irrelevant. It may have been more important in earlier
kernel versions (i.e., before hcd->rh_registered was invented), I don't
remember. In theory, it would now be okay to move the usb_disconnect()
call after the locked region, but it would look kind of strange to
acquire the mutex and immediately release it without doing anything in
between. I guess the next best thing would be to hold
usb_bus_list_lock instead across these lines:
spin_lock_irq (&hcd_root_hub_lock);
hcd->rh_registered = 0;
spin_unlock_irq (&hcd_root_hub_lock);
That at least would make sense, and it would point out that
hcd->rh_registered really is protected by both the mutex and the
spinlock.
> I guess I'm hung up because I don't know how managing the set of root
> hubs is different from managing usb_bus_list. Maybe it has something
> to do with the device_add() call in register_root_hub() and the
> device_del() call in usb_disconnect(). We hold usb_bus_list_lock in
> both cases, but neither does anything with usb_bus_list. Or maybe it
> has something to do with the visibility of stores to
> hcd->rh_registered?
Basically yes. We need to prevent writes to hcd->rh_registered in both
interrupt context (for example, in usb_hc_died()) and process context
(usb_device_read()). Therefore it has to be write-protected by both a
spinlock and a mutex: hcd_root_hub_lock and usb_bus_list_lock.
The bus list, by contrast, is used only in process context. Therefore
the mutex suffices to protect it.
Does this all make sense? I realize it's rather obscure, and it
doesn't help that the usages changed after the names were chosen.
Alan Stern
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html