On Wed, 4 Aug 2004, David Brownell wrote: > > No, no. Preventing tasks from interfering with another is absolutely > > necessary. I'm saying that even if hub_events() called > > down(&hdev->serialize) instread of locktree(), we would still have that > > prevention. > > No we wouldn't. Consider a task suspending a tree; now khubd > comes in and wakes something up that's already been carefully > suspended. You're allowing interference ... encouraging it.
In fact that can't happen; khubd can't wake up suspended devices. (To be safe, I'd better say that it couldn't before Greg reverted some of my patches...) But demonstrating this assertion would involve special-case reasoning of the sort you're not interested in, so I won't do it. Instead I'll show that your ideas and algorithms don't achieve your goals, and suggest some improvements... > Yet I've demonstrated a locking policy that works the same in > all those cases, without claiming that, for example, reset() is a > special case "because the devices vanish". (Because in fact > they don't necessarily vanish, and it might be desirable to > arrange things so it doesn't look as if they did... without needing > to re-whack locking again first!) Your policy isn't correct, or at least, it doesn't do what you seem to be claiming it does. To see why, let's go back to first principles. You are concerned about "non-data signalling" -- which I take to mean any action that, when applied to a hub, can affect the entire subtree of devices below that hub. (There's not much point in considering the subtree of devices below a non-hub, which is why I use this phrasing.) A better name might be "subtree operations"; if you don't mind I'll start using that name in place of the other. When you think about it, lots of things qualify as subtree operations. Suspend, resume, and reset certainly. But also create-device, bind, set-configuration, unbind, and disconnect... Just about every operation for which a lock is needed. IIRC the only remaining situation where we require locking is to prevent topology changes during a tree traversal, and since that involves locking everything starting from the root it essentially does the same sort of thing as locktree() anyhow. (Why I included device creation and binding above may not be clear. Creation is a degenerate case; when a hub is created it doesn't have any children, so creating it _does_ affect the entire subtree below it! Binding is less trivial, although it's also true that when a hub is bound to the hub driver it can't have any children. More importantly, during binding and probe() a driver may need to issue a device reset. The lock already acquired can't be released, hence the type of locking needed for binding must be the same as the type required for reset, i.e., something appropriate for subtree operations.) Faced with this analysis, you may agree and go on to say that it simply shows that _all_ our locking should use locktree(). But that's not right, because your procedures for suspend/resume -- which _do_ use locktree() -- don't accomplish what you want. That is, they don't provide general-purpose serialization with no need to consider special cases. To understand why this is so, let's use a particular example. Take this hypothetical device tree: _____A_____ / | \ _B_ E _G_ / \ | / \ C D F H I Now suppose task T1 decides to suspend the subtree starting at G. T1 calls locktree() to lock G, then locks H, suspends it, and unlocks it. Then it locks, suspends, and unlocks I. Finally it suspends and unlocks G. While all that is going on, suppose task T2 decides to suspend the entire tree starting at A. It uses locktree() to lock A. Then it locks B, then it locks, suspends, and unlocks C & D. Then it suspends and unlocks B. And so on... It doesn't block until it tries to lock G, at which point it has to wait for T1 to finish up. Observe: T1 and T2 are operating concurrently on overlapping subtrees. At the same time the T1 is suspending H, I, and G, T2 might be suspending C, D, and B. This example demonstrates that your algorithm allows concurrency; it does _not_ achieve complete serialization. What kind of concurrency is allowed? Answer: a suspending task (T2) can operate while another, previously started task (T1) manipulates a sub-sub-tree, up until the time when it needs to touch the same devices. Put more simply, a suspending task allows other sub-subtree operations on nodes it hasn't touched yet. The locktree algorithm will prevent other tasks from working on the devices after they are suspended, until T2 releases its lock on the topmost node, A. It has to be this way; otherwise someone might resume something below A before T2 suspends A, and that would certainly lead to trouble. One might claim that this isn't so bad, that the intermediate stage of T2 when it has suspended C, D, and B (for example) means that it has been operating on a sub-subtree disjoint from the H,I,G sub-subtree. That sort of argument doesn't wash when considering resume(). Since resume() works on devices from the top down, during its intermediate stages the devices it has affected don't even form a subtree! Now suppose for the sake of argument there was some subtree operation (call it SOP) which had to prevent other tasks from working on devices _before_ SOP had touched them but which was okay for concurrent execution on devices _afterwards_. (There aren't actually any operations exactly like this although resume comes close; it's not quite a time-reversed version of suspend. But that shouldn't matter, since you're not interested in special-case considerations -- the locktree approach is supposed to work no matter what the nature of the subtree operation.) The code used by suspend won't work for SOP. T2 running SOP starting at A, for example, would need to explicitly lock all the nodes before it could start doing anything. Otherwise the possibility would exist that the previously-started T1 could still be doing something below G, which would be bad. Furthermore, once T2 finished applying SOP to some nodes and they were available for other work, there wouldn't be any way for another task to do anything to those nodes until T2 released its lock on A. So legal concurrency would be ruled out. To summarize, your locktree() algorithm allows some forms of concurrency but not others, and it doesn't prevent certain forms that might be undesirable. Taken together, these observations suggest an alternative to the locktree approach. Consider the following simple rule for serializing subtree operations: Concurrency is allowed whenever it is _safe_ (which must be determined according to the nature of the particular operations acting concurrently). Serialization is achieved simply by locking device nodes. With this rule there is no need for locktree() at all. Here's how it would work... Right now we have four different subtree operations to worry about: suspend, resume, reset, and disconnect. Everything else reduces to disconnect, and since resetting a hub disconnects all its children, reset reduces to disconnect also. So we only need consider suspend, resume, and disconnect. (If in the future we create a version of reset that _doesn't_ disconnect a hub's children, this analysis would have to be expanded.) Resume and disconnect are actually quite simple. They have _no_ serialization requirements. You can do whatever you like to a suspended device before resume() sees it or to the device after it has been resumed, with no harm at all. Likewise, you can try to do whatever you want (although you won't get very far!) to a NOTATTACHED device before disconnect() sees it with no harm done. Obviously you can't do anything to such a device afterward. So for both resume and disconnect, it suffices to lock a device node while processing its children (to prevent topology changes) and while processing the device itself. Suspend is harder because we can't allow anything to happen to the suspended devices in a subtree until the entire operation is complete. On the other hand, there's no problem about doing something to devices in the subtree _before_ suspend() reaches them. So the algorithm should go like this: Lock a device, then recursively suspend its children, then suspend the device. When the entire subtree has been suspended, release all the locks starting from the bottom and going up. These algorithms provide safe concurrency, exactly when and as desired. Each device is locked precisely during the time that another subtree operation on it cannot be allowed. Okay, now maybe you'll object that I really am using special case reasoning here, just pushed down a level. Fine. If you really want something that is truly general-purpose and can be used for _any_ subtree operation without regard to the details of what it does, there's only one possibility: Every subtree operation must lock the entire subtree before doing anything and must complete its work before releasing any locks. In other words, complete serialization. Any alternative will allow some form of concurrency, which could be bad for some types of subtree operations. This is the only approach guaranteed to work for any operation, regardless of its detailed nature. Doing things this way wouldn't really be all that bad. Yes, it would degrade performance a little, but not all that much and we don't care about performance at this stage anyway. You mentioned that getting all the locks at once would be slower... but that's not a concern. Notice that this approach also avoids using locktree(). There you have it. This is why I think locktree() doesn't really do what you want and isn't necessary in the first place. Alan Stern ------------------------------------------------------- This SF.Net email is sponsored by OSTG. Have you noticed the changes on Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now, one more big change to announce. We are now OSTG- Open Source Technology Group. Come see the changes on the new OSTG site. www.ostg.com _______________________________________________ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel