|
http://lwn.net/Articles/153203/
Suspend-to-disk is a feature desired by many Linux users; both laptop
and
desktop users can benefit from being able to save the state of the
system
to a local drive and, after a reboot, find everything as they left it.
The
current in-kernel suspend mechanism works for many, but not everybody
is
comfortable with the large amount of invasive code required. The
out-of-tree suspend2
implementation
adds quite a few worthwhile features, but at the cost of expanding the
software suspend implementation still
further. Concern over putting some of the suspend2 features into the
kernel has been one of the factors preventing its merging so far.
Pavel Machek, the maintainer of the in-kernel suspend implementation, has now complicated the pictured with the swsusp3 patch, which moves some of the work of suspending the system into user space. This code is said to work; if this approach continues to show promise, it could point the way toward adding suspend2's features without growing the kernel. The software suspend process, in very rough terms, works like this:
When the system is resumed, these steps are reversed in the opposite order - except that user-space memory remains on disk until faulted in by the newly-restarted system. The swsusp3 patch does not move all of the above work to user space - much of it must be done in the kernel. What does move is step 3 - the writing of kernel memory - to disk. This operation is handled by way of /dev/kmem. To that end, the swsusp3 patch adds a set of scary ioctl() calls to the /dev/kmem driver. The new user-space suspend program begins by locking itself into memory. This step is required - it would not do for it to change the memory state in the middle of the process via page faults. A call to the new IOCTL_FREEZE operation on /dev/kmem performs the first two steps listed above: freezing processes and clearing memory. The IOCTL_ATOMIC_SNAPSHOT call then puts devices on hold and creates an in-kernel list of pages which must be saved. The ioctl(/dev/kmem, IOCTL_ATOMIC_SNAPSHOT) call returns a pointer to that list of pages. The user-space program can then obtain the list (by reading it from /dev/kmem) and pass through it. Each page on the list is read from kernel memory and written to the suspend image file. Finally, the list itself is written to the suspend image. Once that is done, the system can be powered down. The resume process writes the saved image back into kernel memory. It has the additional problem, however, of having to deal with two kernels at once. This process will be running under a freshly-booted kernel (the "resume kernel") with its own idea of the state of the world; that state will eventually be overwritten by the state from the suspended kernel, but that step must be handled carefully. The resume process cannot simply overwrite arbitrary kernel memory, since it is counting on the resume kernel to continue to function until all of the suspended kernel's memory has been read in. So the user-space resume process must be able to allocate pages in kernel space. The answer is, of course, another ioctl() command, IOCTL_KMALLOC, which executes a get_zeroed_page() call and returns the address of the resulting page to user space. Once a full set of pages has been loaded with the suspended kernel's memory, an updated page map can be stored in the kernel, and an IOCTL_ATOMIC_RESTORE operation tells the resume kernel to finish the process. This code is very much in an early stage; even people who do not hesitate to use software suspend may want to be careful with swsusp3 on systems they actually care about resuming. Once things settle down, however, swsusp3 could open the door to a number of features, including graphical progress displays and the ability to interrupt the suspend process, which users have been asking for. (Log in to post comments)
User-space software suspend Posted Sep 29, 2005 9:13 UTC (Thu) by hawk (subscriber, #3195) [Link] One of the fairly big benefits of swsusp2 is that it doesn't do away with any memory that can be done away with. Doing so may be ideal from some point of view (probably simplifies stuff), but it is definitely not ideal for the user!
After a suspend/resume cycle with swsusp2 (which is actually slightly
quicker than a swsusp1 cycle!) the machine is in the same state as at
was before suspending, it still has the running programs in-memory,
stuff cached, etc. Swsusp1 may work "just as well" (for me at least), but it puts the
system back in a very sorry state, where the system is on the verge of
being unusable for some time after resuming.
User-space software suspend Posted Sep 29, 2005 11:45 UTC (Thu) by rise (subscriber, #5045) [Link] Good points, though I'd like to note that in my experience a suspend2 suspend & resume cycle is much faster than a swsusp1 cycle even with keeping cache and buffers. Suspend2 also has the option to throw away both, which dramatically speeds up the cycle at the cost of an system that's initially a bit sluggish after resume as it faults everything back in - though no worse than swsusp1.
User-space software suspend Posted Sep 30, 2005 3:03 UTC (Fri) by zblaxell (subscriber, #26385) [Link] I do like the fact that swsusp2 resumes with caches and buffers intact. If I wanted to wait while the system painfully restored this data one 4K page fault from swap at a time, I might as well reboot--it could actually be faster.
On the other hand, I generally like to run a small application before
suspending, which allocates memory until a few hundred pages are
swapped (it is a loop of malloc() and reading paging statistics out of
/proc), then exits. This dumps out some of the more useless 400MB or so
of caches on my system, and cuts resume time in half (it does add a
second or two to suspend), without the extreme pain of having to swap
_everything_ back in on resume. I'm not sure what benefit there is in pushing too much of the
suspend
and resume functions into user space. After a while we start to need a
whole lot of system calls to tell the kernel which of its "user space"
processes are in fact absolutely critical to the continued functioning
of the kernel, at which point IMHO it would be much simpler, safer,
smaller, and swifter to just push the whole thing back into
kernel-space. If you combine user-space suspend and resume with
user-space block devices, user-space network devices, user-space
encryption (on either), user-space device configuration, network
storage devices, and device drivers that live partly or entirely in
user-space, there's a whole lot of stuff that is just bouncing back and
forth between user-space and kernel-space with no really sane reason to
do so other than "we don't have to do all of it in the kernel." In one special case of user space--monolithic user-space
applications--there is a similar question of what to include in the
main application's space and what to farm out to other processes.
Sometimes the monolithic application is even called a "kernel." One
solution in common between the Linux kernel and other large
applications is to dynamically load code into the application's address
space (.ko's or .so's). Another solution is to initiate another process
with a separate address space, then communicate with the main
application over some kind of IPC (netlink, /proc, /sys, dbus, hotplug,
mmap...or sockets, pipes, shared memory, mmap). There is a third option which is used by big applications but not
the
Linux kernel: embedded interpreted languages. Modern applications, once
they cross a certain size threshold, tend to suddenly sprout a language
interpreter to cope with their more advanced configuration options
(where "configuration" sometimes amounts to "when I press this button,
execute 1500 lines of custom workflow code"). Things like netfilter get
close to this--iptables is almost Turing-complete, the chains are
analogous to functions, some of the experimental netfilter modules
implement dictionary lookups analogous to variables, and the
non-experimental modules can do basic boolean logic on packets
combining the results from multiple rules, as long as you don't need
more than 8 levels of nesting or 32 bits of storage per packet.
Netfilter in particular could benefit a lot from having a compiler in
user-space generate an optimized (not every netfilter chain entry
*needs* to look at the source and destination network/netmask, but they
do nonetheless) bytecode (or even machine code) filter configuration,
then pushing that code into a much simpler kernel-space implementation.
I'm surprised the Linux kernel doesn't have at least one interpreted
configuration language, not even as a module--other Unixish kernels and
their bootloaders do. Most of the time, the only advantage I ever see from having things
like
root filesystem configuration, device mapping, encryption, firmware
loading, etc. configured from or provided by user-space is that it is
then possible to do non-trivial configurations or experimental
implementations. For example, the md-RAID setup allows a number of
straightforward RAID configurations to be set up automatically by the
kernel, while the device-mapper and other LVM flavors are configured
from user-space and can (in theory) be a lot more flexible. Another
example is encrypted filesystem setup, where you almost certainly want
to have a custom user-space script to retrieve the decryption keys from
whatever they're stored on, match them up with the right partitions,
and of course collect the passphrase from the console. All this stuff
can easily be handled by even a minimal scripting language with the
right set of primitives--most of which would just be wrappers around
existing kernel code, e.g. open() or sha1(). I currently do this kind of userspace configuration on an initrd
with
busybox (almost but not quite as painful as custom C code), custom
binaries (which are comparatively hard to fix when they break, unless
you have the presence of mind to keep a working development environment
on a bootable CD with you at all time), or even shell scripts (which
work, but take up megabytes of space for the 99% of the code you're not
using). IMHO they all suck. The amount of stuff that I have to put into
the initrd keeps getting bigger while the amount of stuff in the kernel
keeps getting...well, bigger, and yet the amount of stuff that the
kernel can do without help from user-space seems to be decreasing with
each new major kernel subsystem. Also, I have to go through some weird
flaky gymnastics to reconfigure user space (pivot_root and
real-root-dev come to mind here) without leaving dangling references to
multi-megabytes of initrd crap taking up RAM and swap. I'd rather just
put 20K of some simple script language runtime into the kernel, have
the kernel read and execute a 4K boot script, and be done with it. It
can't take more than that much code to prompt for a password, run it
through the appropriate salt and hash functions, set up two loop device
AES keys, then exec "/sbin/init".
User-space software suspend Posted Oct 6, 2005 17:36 UTC (Thu) by peschmae (guest, #32292) [Link] > I do like the fact that swsusp2 resumes with caches and buffers intact.> If I wanted to wait while the system painfully restored this data one 4K > page fault from swap at a time, I might as well reboot--it could actually > be faster.
Me too. But on my machine (laptop - harddisk is slow) rebooting would
still be slower ;-) > On the other hand, I generally like to
run a small application before Isn't that exactly what the # ImageSizeLimit 200 item in
hibernate.conf
(or the /proc/software_suspend/image_size_limit respectively) are there
for? Does your way of doing the more or less same thing have an advantage
over that? (Faster maybe?) > I'm not sure what benefit there is in
pushing too much of the suspend and I agree here. Because it still seems to need very much code in the
kernel - only a minimal part is user space application. Peschmä
User-space software suspend Posted Oct 6, 2005 19:11 UTC (Thu) by zblaxell (subscriber, #26385) [Link] Normally suspend2 writes all non-free pages (including clean cache pages and cached swap pages). This is a bit annoying for me, since 90% of the time I use less than 40% of my laptop's memory, but I have to wait for the other 60% of the RAM to be read and written at suspend and resume time.
ImageSizeLimit is an upper bound on the image size. If the image would
be larger than this, then there is a pre-suspend forcing of
pages--dirty or not--to disk. If the value is not dynamically chosen,
it is inefficient--too high, and unnecessary pages are written in the
suspend image; too low, and suspend and resume time is significantly
increased since a bunch of stuff has to be swapped out before suspend
and back in after resume, and page for page the swapper is much slower
than Suspend2's image writer. Dynamically choosing the value is
apparently non-trivial...at least I tried to do it for a while, then
gave up. My application forces all the clean pages (600MB as I write this) to
go
away, without losing active program text pages or forcing dirty pages
to swap. It stops as soon as there are more than 100 pages written to
swap since the program started running, so it does not significantly
extend the suspend time (a few hundred pages are swapped before the
application notices and exits, which does take a second or so). This approach doesn't need prior configuration--it automatically
discovers just how much RAM can be cheaply freed by allocating as much
as the system can spare without swapping, then it exits and leaves
thousands of free pages. Without all the extra pages, the suspend image is much smaller, so
suspend and resume are faster. Since only a few dirty or active pages
were actually swapped, it doesn't noticeably slow down the machine
after resume (there is more overhead when xscreensaver wakes up after
noticing the wall clock time jumping well past the inactivity
threshold, than there is from post-resume swapping ;-).
User-space software suspend Posted Oct 30, 2005 1:51 UTC (Sun) by NinjaSeg (subscriber, #33460) [Link] Errr, care to share it with us?
User-space software suspend Posted Nov 4, 2005 0:43 UTC (Fri) by zblaxell (subscriber, #26385) [Link] #!/usr/bin/perl -wuse strict; use Time::HiRes qw(time);
sub swapfree { my $last_swapfree = swapfree; my $count = 0; my $start_time = time; while ($last_swapfree <= (my $new_swapfree = swapfree)) {
User-space software suspend Posted Sep 30, 2005 16:41 UTC (Fri) by richardfish (guest, #20657) [Link] Could not agree more! The biggest reason I prefer suspend2 is because it preserves cached memory.
User-space software suspend Posted Oct 6, 2005 15:31 UTC (Thu) by quintesse (subscriber, #14569) [Link] How much longer is this going to take? It's the year 2005 for god's sake and Linux still has no perfectly working suspend/hibernate? It's really one of the few things that drives me nuts at times about Linux (re-installing all kernel modules for a new kernel is the other).
NB: But I think I was succesful in convincing the maintainer of the
ATrpms to include swsusp2-enabled kernels in his repository so
hopefully I won't have to worry about swsusp2 anymore in the near
future :-) NB: Now to convince NVidia to make their drivers suspend-compatible!
User-space software suspend Posted Nov 7, 2005 22:01 UTC (Mon) by lacostej (subscriber, #2760) [Link] > NB: Now to convince NVidia to make their drivers suspend-compatible!
I sure need that as well. Come on nvidia! I have a 3.5 years old Dell
laptop and suspend to disk never worked!
User-space software suspend Posted Apr 8, 2006 20:51 UTC (Sat) by lacostej (subscriber, #2760) [Link] I would like to update my statement.
After upgrading to Ubuntu dapper drake test flight 6 and following
this: https://wiki.ubuntu.com/NvidiaLaptopBinaryDriverSuspend I am finally able to suspend to RAM (and probably disk as well). I've tested this with the latest nvidia kernel and madwifi (all
installed by ubuntu) while on a wireless connection with Skype software
on. Suspended. Waited for 30 seconds. Reopenend the machine, tried to
call laptop using Skype from another PC, and It Just Worked. Finally. Too bad the machine (Inspiron 8100) is getting really tired
(4
years old, one dead battery, one dead USB, one dead PCMCIA, dead CD/DVD
drive). That's without counting the 2 replaced hard disks and 2
replaced motherboards, + the dead keyboard. |
