Marcus raises a good point, but he is missing some history. In KeyKOS, the "resume capability" served the role of a reply capability. When a node containing a resume capability was destroyed (by the space bank), a well-defined, distinguished message was sent to the recipient for all resume capabilities contained in that node.
In EROS, this behavior was dropped, and in my opinion that was a mistake. In Coyotos, there is no way to distinguish resume capabilities from entry capabilities (or at least, not at the moment) so it is difficult to duplicate the KeyKOS behavior at the moment, but see below. In any persistent system, "notify on last capability drop" is impractical. It requires disk garbage collection, so the delay is too long to be helpful. It would not be difficult to add a bit to an FCRB capability to support this. We could call it the "invoke on delete" bit. Here is the meaning of this bit: On destruction of any object, the capability slots are examined. For any slot that contains an "invoke on delete" FCRB sender capability, a non-blocking message will be sent indicating that the capability was held within a deleted object at the time of deletion. If sending this message would block, it will not be delivered. If the FCRB sender capability is invalid, no message will be sent. HOWEVER: This message does NOT mean that all capabilities to the FCRB are gone. It means that *some* object containing the capability has been destroyed. If there are multiple copies of the capability in different objects, and one of these objects is destroyed, the message will be sent. Programs can take deliberate steps to suppress this behavior, but this would be the normal outcome. This is not quite the semantics that Marcus is after, but in practice it was good enough in KeyKOS. If this is sufficiently helpful to justify revising the Coyotos spec, please send a note to coyotos-dev confirming that this update should be made. > * Whatever user program destroys the failed server process D, also > takes care of the users of the process D. This solution requires > significant structural overhead, and creates undesirable strong > dependency structures in the system (for example, global managers). This solution is impossible. The storage containing those capabilities is gone, and the party who destroys that storage usually does not have access to the content of the storage. In particular, the authority to destroy a space bank specifically does NOT include the authority to inspect storage that has been allocated by that bank. This is absolutely essential for confinement and *any* security policy. > * The program S could use timeouts in the call to D. This solution > requires significant structural changes to the system design, > because now time becomes an important parameter in evaluating > services. It can be tried to argue that this is desirable anyway. This solution leads directly to systems that fail under load. There is general agreement in both the L4 and EROS/Coyotos communities that generalized timeouts were a mistake, and that "forever" and "don't wait" are the only options that should be implemented by the IPC layer. > * Following Mach, special "send-once" capabilities are introduced that > implement the send-once semantics. Here are the semantics expressed > in terms of Coyotos: When copied, the source capability is > invalidated (so the number of send-once capabilities to a given > object is a system invariant under capability copy operations). The semantics of send-once rights is an abomination. The cost of them is considerable, and the overhead of manipulating them correctly from the application perspective is a serious problem. Coyotos will not under any circumstances implement "send-once" or "grant-only" capabilities. > This has the disadvantage that it makes task destruction somewhat > more expensive... Worse, it has the disadvantage that every capability copy must be preceded by a capability type check, so that the sender knows whether it is losing the capability as a side effect. This violates encapsulation in a fairly fundamental way. > 1) Is RPC robustness desirable/required, or is an alternative model > feasible where machine-local RPC is as unreliable as IP/UDP network > communication? Yes, it is important. > 2) If it is indeed desirable, are there more possible solutions than > the three approaches described above? Yes. "Invoke on destroy of containing object". > 3) Are the costs of destroying send-once rights (and thus sending > messages) acceptable? Given a positive answer to 1, and a negative > answer to 2, are these costs in fact inavoidable? The costs are high and the semantics are horrible. > 4) In fact, if we consider persistence, can not the same mechanism > above that was described to help with malicious or buggy software > be used to deal with the planned and desired removal of device > driver servers from the system at reboot of the persistent machine? This is a completely different problem, and it is handled by the normal revocation logic. > IOW: As far as I understand, EROS had a logic to restart RPCs that > were pending and which were sent across the boundary between the > persistent and the non-persistent world. It did not. The design called for a mechanism by which both parties would discover that the RPC was incomplete because the communication channel had been destroyed. shap _______________________________________________ L4-hurd mailing list L4-hurd@gnu.org http://lists.gnu.org/mailman/listinfo/l4-hurd