Hello Jim, When I last sent this the time/date of the message was about half a day before your initial message. So I am not sure if you got my response. Please see below my response to your OOM message.
Best regards, Marc Marc E. Fiuczynski wrote: > Hello Jim, > > We (PlanetLab) have found that OOM does some relative bad things causing > a system to get into an unusable state. We replaced OOM with something > that just panics and reboots rather than letting the system get into an > unrecoverable state, which we need as many of our PL servers are in > remote locations and unattended (kinda like your mini-servers will be, > but unlike your laptops). And we then introduced a user-level OOM > governor, which is probably something far more rudimentary than what you > are after. Our governor, called pl_mom because "she cleans up your > mess", assumes that separate applications/services are instantiated in > separate vservers (slices). From what I gather, this is definitely the > direction that OLPC is going for the laptop and mini-server-gateways, so > our approach might be at least from a thought perspective applicable. > > What does pl_mom do? At the moment she kills the vserver with the > largest aggregate VSZ (i.e., all processes within that vserver). This > works for PlanetLab, but might not the best approach for OLPC. We have > found that most OOM scenarios occur by a slow leaker that has its pages > swapped out by kswap (which happens on the order of a few hours and are > hard to detect with the current vm metrics we peek at). Since pl_mom > does the trick for our usage scenario on PlanetLab for now we have not > had an incentive to improve it further. However, one should definitely > look at better vm statistics to make a better choice than largest > aggregate VSZ. > > The code for pl_mom is available via anon cvs from: > > cvs -d :pserver:[EMAIL PROTECTED]:/cvs co pl_mom > > Take a peek at swap_mom.py and its helper functions in pl_mom.py. > > I'm cc'ing Faiyaz Ahmed, who is the person at Princeton who is currently > maintaining pl_mom. > > Best regards, > Marc > > Jim Gettys wrote: > >> OLPC needs a OOM governor, so that the "right" process gets shot when we >> run low on RAM, and that processes that might get shot know enough to >> save state for restart. As you know, various problems appear if the >> wrong process is killed, usually resulting in needing a restart. >> >> Note that the kernel has to be able to recover memory when it needs it, >> or it will deadlock: this is a situation where the kernel must be in >> control, but user space could cooperate much better than it does today, >> by providing appropriate hints. So don't say: "the kernel shouldn't >> kill processes: user space should"; that design doesn't fly. >> >> Here's Kimmo Hämäläinen description of the (current) kernel OOM killer. >> >> The OOM killer selects a process to kill by assigning a score to each >> process; the process with the highest score is the lucky winner that >> will be killed. The current OOM score for >> a process is visible in proc. The entry is in /proc/PID/oom score. The >> starting point of the score is the amount of memory consumed by the >> process and its children. This value is adjusted as follows: >> • It is set to zero if the process has no memory management or if the >> process has a negative >> nice value (this can be used for protecting processes from the killer). >> • Divided by the square root of the CPU time consumed by the process. >> • Divided by the square root of the square root of the run time of the >> process. >> • Multiplied by 2 if it is a process with a positive nice value. >> • Divided by 4 if it is a superuser process. >> • Divided by 4 if it is a process with direct hardware access. >> • Finally, the value is adjusted (shifted either left or right) by the >> oom adj value. It is shifted left in case the value is positive and >> right in case the value is negative. >> This means that a negative oom adj value will decrease the score and >> also decrease the risk that the particular process will be killed. A >> positive value will have the opposite effect. The value should be no >> smaller than -16 and no larger than 15. >> >> Please note that you can set the oom adj value in the proc file system. >> It is located at /proc/PID/oom_adj. For more information about how the >> OOM killer behaves, see the Linux kernel source code, mm/oom kill.c in >> particular. >> >> So we need an OOM killer helper. >> >> We have the ability to provide the kernel with much of the >> information it needs for much better behavior, if we choose. >> >> I see this project evolving through the following incremental >> improvements (and incremental difficulty) as set out below: >> >> 1) start by setting the oom_adj appropriately so that the processes we >> really care about don't get shot. >> >> 2) make this a window manager plug in (plug in, as people including us >> may end up using other window managers) that uses the stacking order on >> the screen to rank order the activities that are running. >> >> 3) provide a mechanism by which applications may be given a hint that >> they might find it good to save enough state for a checkpoint restart, >> because they are likely a good candidate for shooting. >> >> 4) use the XRes facilities in X (and/or modify X) to provide the kernel >> with the pixmap usage on a process ID basis, for local >> applications/activities. >> >> 5) see if there are better OOM algorithms that Linux presently has. >> >> Discussion? Anyone want to take on this project, or parts of this >> project? >> - Jim >> >> > _______________________________________________ > Devel mailing list > [email protected] > http://lists.laptop.org/listinfo/devel > _______________________________________________ Devel mailing list [email protected] http://lists.laptop.org/listinfo/devel
