Kernel-based checkpoint and restart

By Jonathan Corbet
August 11, 2008

Your editor, who has carefully hidden several years of experience in Fortran-based scientific programming from this readership, encountered checkpoint and restart facilities a long time ago. In those days, programs which would run for days of hard-won CPU time on an unimaginably fast CDC or Cray mainframe would occasionally checkpoint themselves, minimizing the amount of compute time lost when (not if) the system went down at an inopportune time. It was a sort of insurance policy, with the premiums being paid in the form of regular checkpoint calls.

Central processor time is no longer in such short supply, but there is still interest in the ability to checkpoint a running application and restore its state at some future time. One obvious application of this capability is to restore the application on a different machine; in this way, running applications can be moved from one host to another. If the "application" is an entire container full of tasks, you now have the ability to shift those containers around without the contained tasks even being aware of what is going on. That, in turn, can provide for load balancing, or just the ability to move containers off a machine which is being taken down.

Linux does not have this capability now. Anybody who thinks about adding it must certainly find the prospect daunting; applications have a lot of state hidden throughout the system. This state includes open files (and positions within the files), network sockets and pipes connected to remote peers, signal states, outstanding timers, special-purpose file descriptors (for epoll_wait(), for example), ptrace() status, CPU affinities, SYSV semaphores, futexes, SELinux state, and much more. Any failure to save and properly restore all of that state will result in a broken process. It is no wonder that Linux does not do checkpoint and restart; most rational developers would be driven away by the complexities involved in making it work in an even remotely robust manner.

But, then, there was a time when rational programmers would not have attempted the creation of Linux in the first place. So it should not be surprising to see that developers are working on the checkpoint and restart problem. The latest attempt can be seen in this patch set posted by Dave Hansen (but originally written by Oren Laadan). It is far from being ready for prime-time use, but it does show the sort of approach which is being taken.

For some time, the prevailing wisdom was that checkpoint and restart should be pushed as much into user space as possible. A user-space process could handle the marshaling of process state and writing it to a file; the kernel would only get involved when it was strictly necessary. It turns out, though, that this involvement is required fairly often, requiring the addition of "lots of new, little kernel interfaces" to make everything work. So, at a meeting at OLS, the checkpoint/restart developers decided to take a different approach and move the work into the kernel. The result is the creation of just two new system calls:

    int checkpoint(pid_t pid, int fd, unsigned long flags);
    int restart(int crid, int fd, unsigned long flags);

A call to checkpoint() will write an image of the current process to the given fd. The pid argument identifies the init process for the current process's container; it is saved to the image but not otherwise used in the current patch. If the operation succeeds, the return value will be a unique (until the system reboots) "checkpoint image identifier". restart() reverses the process; crid is the image identifier, which is not currently used. The flags argument is currently unused in both system calls. These interfaces seem likely to change; future enhancements to the interface are likely to include capabilities like checkpointing other processes and groups of processes.

The CAP_SYS_ADMIN capability is currently required for both checkpoint() and restart(). That is somewhat unfortunate, in that it would be nice if ordinary, unprivileged processes were able to checkpoint and restart themselves. There are some real security implications which must be kept in mind, though, especially when one considers the sort of damage that could result from an attempt to restart a carefully-manipulated checkpoint image. Making restart() secure for unprivileged use will not be a job for the faint of heart.

At this stage of development, the patch does not even attempt to solve the entire problem. It is able to save the current state of virtual memory (but only in the absence of non-private, shared mappings), current processor state, and the contents of the task structure. That is enough to checkpoint and restart a "hello, world" program, but not a whole lot more. But that is a reasonable place to start. Given the complexity of the problem, proceeding in careful baby steps seems like the right way to go. So we're probably not going to have a working checkpoint facility in the kernel in the near future, but, with luck and patience, we'll eventually have something that works.

[linuxkernelnewbies] Kernel-based checkpoint and restart [LWN.net]

Kernel-based checkpoint and restart

Reply via email to