On Thu, 17 Feb 2005, David Brownell wrote: > On Wednesday 09 February 2005 12:23 pm, Alan Stern wrote: > > > As far as usbcore is concerned, the states an HC can be in are: > > > > a. Running normally. > > > > b. Root hub suspended. > > > > c. Root hub and HC both suspended. > > > > d. Died. > > > > e. Gone (i.e., after usb_deregister_bus). > > There are also states where it's transitioning. For example, while > suspending it should still be OK to complete requests, but not start > new ones (the children should already be suspended anyway
What if CONFIG_USB_SUSPEND isn't set? Yes, I know we're trying to move away from this possibility. But for the moment we still need to handle it correctly. > ); but once > suspended, neither is allowed. (Bear in mind that we may eventually autosuspend HCs, with autoresume upon demand. The "demand" might be submission of an URB. Not relevant just now, though...) > Similarly, while shutting down no more > requests should be allowed, or resuming, or restarting. Okay, you can add the transitioning states into the list if you like. It won't affect my conclusions much. > And (e) is off the edge of the picture ... there's no active HC, so it > has no state known to usbcore. If usbcore has a handle on such a > data structure, that's a BUG(). Maybe there's no state, but there is a data structure. It hangs around until the refcount drops to 0, which might not be for some time after the bus is deregistered. The existence of pointers to such things isn't a BUG, and so long as the pointers exist they can be dereferenced. > > For the most part, the tests are intended to prevent actions from taking > > place when the HC is not in an appropriate state. The big problem is that > > the state transitions are not synchronized in any way with the tests. > > At one point, it was true that all the tests in usbcore could handle > the transitions locklessly, because of the nature of the transitions > and the fact that the HCDs had explicit tests on the submit path. Maybe we need to adopt that kind of strategy again. > > This isn't surprising, since only the HCD is in a position to do such > > synchronization. > > > > States a and b don't require any special action at all. Eveything is > > legal in those states. > > Submitting urbs shouldn't be allowed in any of those states except > "running normally". I can't tell if you are agreeing with me or disagreeing. State a _is_ "running normally". State b is "root hub suspended", which means the root hub isn't running normally but the HC is. (In state b, URBs won't be submitted unless CONFIG_USB_SUSPEND isn't set -- but we do need to allow for that possibility. Especially for the root hub polling URB.) > > Assuming drivers are careful not to use a device after being > > unbound from it, we only need to worry about the small window between the > > time the controller dies and the time the devices are disconnected. But > > notice that no matter what we do, we can't handle the even smaller window > > between the time the controller dies and the time hcd->state is set to > > HC_STATE_HALT. So tests 1-4 don't do much for d and e. Of course 5 is > > useful for d, even without precise synchronization. > > When it dies, the HCD ought to set the state with its internal > data structures locked. At one point I checked all the code to > make sure the HCD's spinlock was held whenever hcd->state was set. "When it dies..." The HCD can't do anything at the moment the HC dies, because it won't learn about the death until the next interrupt comes. Obviously. Before that interrupt arrives the HCD has to be able to handle requests even though the HC is dead. That's true whether locking is used or not. Furthermore, what good does it do for the HCD to hold a spinlock while setting hcd->state if usbcore doesn't hold the same spinlock while testing it? I'm concerned about making usbcore work right -- my assumption is that the HCDs are already in good shape. > > We should consider d-6: detecting that the HC has died. I'm not so sure > > that we're doing it the best way. First, an HCD might decide that the HC > > is dead when a watchdog timer fires, not when an interrupt is received. > > Second, it should be just about as easy for an HCD to call usb_hc_died > > directly as to set hcd->state. So action 6 could be eliminated. > > If the HCD sets the state with its internal spinlock held, then it > doesn't matter whether the IRQ is from a timer or the controller. Yes it does. It matters because if the IRQ is from the controller then usb_hcd_irq will automatically call usb_hc_died, but if the IRQ is from a timer then the HCD has to call usb_hc_died. _If_ the HCD would call usb_hc_died in both cases, _then_ it wouldn't matter. And we could simplify usb_hcd_irq. > There was also the issue that HALTED state can be entered through normal > state transitions ... "died" was supposed to be an abnormal transition. True, but I don't think it's important for the purposes of this discussion. > > So the whole thing boils down to state c. While the HC is suspended is > > when we really want to avoid trying to access the controller. However > > lack of synchronization means that we aren't really doing this correctly > > now. Either we need to find a way of preventing 1-4 precisely while the > > HC is suspended or we might as well give up and just rely on the HCDs to > > do the necessary synchronizing. Whatever this way is, it will have to > > involve the HCDs somehow because the suspend/resume callbacks for non-PCI > > HC devices don't pass through usbcore. > > Remember though that there are supposed to be system-wide invariants for > the device tree during suspend processing. Specifically, that no device > gets suspended after its parent, or resumed before it ... so how would > one of those operations 1-4 get past the usb_device checks?? Well, if CONFIG_USB_SUSPEND isn't set then we certainly _can_ have HCs and root hubs suspended without any of their children being suspended. And operation #4 is get_frame_number, which doesn't have any usb_device checks since it's not associated with a USB device. > > My conclusion is that on the whole, hcd->state can be eliminated from > > hcd.c. The exceptions are: > > > > hcd->state, or something like it, will be useful to prevent IRQ > > forwarding at times when we know the HC isn't active. > > For PCI, the IRQ handlers get deregistered during suspend. There's a > small window between requesting the IRQ and calling hcd->start(), and > likewise after stop() and before freeing it. That doesn't mean the test is useless. Even when the HC is incapable of generating IRQs, there might be another device sharing the same IRQ line. We would like to prevent those other interrupts being sent to the HCD, just as a simple optimization. > > Those two warnings (7 and 8) seem to need it. But do we need > > the warnings? I don't think I've ever seen them triggered. > > We don't want to see #7, those tend to indicate ACPI or other > IRQ setup problems ... the point of #7 is to try to make sure > that more appropriate people handle the messages! You may not > have seen them, but they get reported regularly enough. (And > then the response is to go talk to the folk who handle the IRQ > setup/mapping.) We used to get an _extremely_ annoying number > of "USB doesn't work" problems with this root cause; no more! > I'd be opposed to removing this message, it's been a huge win. #7 is warning about unlink while the HC isn't running. This means the HC is either suspended, dead, or stopped because the HCD is being unloaded. Is it safe to say that the first two aren't common in bug reports and don't reflect ACPI or IRQ setup problems? The third should never happen at all; it would mean unlinking an URB after the device had been disconnected and the endpoint disabled. Maybe you're thinking of a different warning message, the one that warns about unlinking before an IRQ was received? > Re #8 that's just to catch any bugs internal to usbcore/hcds; > as a rule, it's safe to treat all WARN_ON(x) as assertions > about how things are expected to behave. Which means that if > they ever appear, there's a problem to fix ASAP ... this one > isn't like #7. It isn't like the "Unlink after no-IRQ" warning, but it is like #7. > > Blocking calls to the HCD while the HC is suspended is the > > single major stumbling block. It should be done properly > > (which hcd->state can't do) or not at all. > > I'm not sure what you mean by that. Again, a lot of those > tests are more at the level of integrety assertions ... they > don't fire, we'd worry if they did. The warnings #7 and #8 are like that. But #1 - #4 aren't: preventing URBs being submitted, preventing root hub polling, and preventing get_frame_number whenever the HC isn't running. They are not integrity assertions. #5 is more of an optimization: don't forward an IRQ when we know the HC didn't generate it. And #6 is merely a convenience: call usb_hc_died on behalf of the HCD so the HCD doesn't have to. Alan Stern ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel