Hi, Bas Wijnen <[EMAIL PROTECTED]> writes:
> I am indeed. :-) In some of my limited free time I'm currently trying to > write a library for persistent applications, so I get a better feeling for > what persistence is. The idea is to create applications which checkpoint > themselves every now and then, can be aborted at any time (or made to > checkpoint-and-abort as an atomic operation), and when restarted continue from > their last checkpoint. I'm trying to design the database in a way that it is > possible to "restart" the program with a new version of it, possibly keeping > open the file descriptors so it doesn't even suffer from closed connections. Did you look at libckpt[0] and similar libraries (ad: I once wrote pego[1] which produces /portable/ checkpoints, unlike libckpt, but I'm not sure it's interesting in this case ;-))? File descriptors and capabilities are the main issue since they are bound to state that is /external/ to the application (be it in the kernel or in a server). In Fluke, the authors argue[2] that kernel state should be exportable to allow for the implementation of user-level checkpointing. However, in a multi-server system, application state is spread across a bunch of servers which would all have to make their state exportable. But from the server viewpoint, restoring complex state from an untrusted source is not a reasonable thing. Furthermore, a protected capability system does not allow the disclosure of the "bit representation" of capabilities[3], so checkpointing capabilities themselves is a meaningful way is not something applications can do on their own. In EROS, the whole system (kernel - drivers + all the processes) is persistent, so there is, I think, no such problem: each checkpoint contains everything that's needed to restore the whole thing. The issue of restoring capabilities and their associated state arises when trying to make only part of the processes persistent. One solution would consist in logging all the interactions between the persistent world and the non-persistent world in order to replay them upon recovery, but that's quite ugly IMO. Instead, maybe special support from the capability system could solve that. Good luck, and best wishes! ;-) Ludovic. [0] http://www.cs.utk.edu/~plank/plank/www/libckpt.html [1] http://www.laas.fr/~lcourtes/software/ego/ [2] http://www.cs.utah.edu/flux/papers/iwooos96-flobs.ps.gz [3] http://lists.gnu.org/archive/html/l4-hurd/2005-10/msg00010.html _______________________________________________ L4-hurd mailing list [email protected] http://lists.gnu.org/mailman/listinfo/l4-hurd
