At Tue, 25 Apr 2006 11:45:18 -0600, "Christopher Nelson" <[EMAIL PROTECTED]> wrote: > I like hard real-time systems. I have thought a lot about the recovery > aspect of system design. To me it seems like you have two situations:
Can you give us some references to prior work on this topic that is most relevant here? Papers, Thesises, etc. > This might be extended to IPC by doing something similar. It may not > ever be necessary to know "when" to stop retrying. It may be possible > to indicate to a user that the requested operation is taking longer than > expected, and to give the user the opportunity to cancel the request. > Other servers (such as a mail server) may have a settings file which > dictates how "long" it should keep retrying an operation. > > In these situations, the metric for timing out may not be some > compile-time constant, but can be dependent on what the user has said > should happen. (In the case of a settings file, it is probably a > "knowledgeable" user, since all servers should come set with reasonable > defaults.) > > One other idea that may not be feasible is in regards to timouts being > flaky in the case of heavy load. Perhaps it would be better to > stipulate that the watchdog should keep track of how many requests have > been processed, and how many are pending. Over time this indicates an > "average load". If this number starts to rise sharply, the watchdog may > assume that it is now under a heavier load, and can use some metric to > back off on it's abort policy. Think about how Ethernet cards use > binary exponential backoff to make sure only one system is transmitting > at once, without any explicit session policy. > > Essentially, apps and servers need to be smarter and need to expect > things to go wrong. My concern is that in a system with such complex dynamics, there may be emergent behaviour that is totally different from what you actually want. Your binary exponential backoff is a very good example, as originally designed it lead to starvation (ethernet capture effect). Jeff Mogul calls this "emergent misbehaviour", see: http://www.cs.kuleuven.ac.be/conference/EuroSys2006/papers/p293-mogul.pdf I really hope that we find a simpler solution, potentially by reducing the requirements. Thanks, Marcus _______________________________________________ L4-hurd mailing list L4-hurd@gnu.org http://lists.gnu.org/mailman/listinfo/l4-hurd