On Tue, 2008-09-09 at 00:10 -0400, Michael Stone wrote: > Dear devel@, > > Kim, Greg, and I have concluded that the instability we experience under > memory-pressure in 8.2-759 and similar is the single "hard" issue that > we wish to _attempt_ to address before releasing 8.2 on current > timeframes. (We recognize that there are several other issues marked > as blocking the release but we are confident that they will be resolved > satisfactorily or are, in a few cases, beyond help.) > > Since most other aspects of the release seem to be running smoothly, Kim > asked me to take a more direct role in organizing our efforts produce a > release which avoids memory pressure when possible and which is > better-behaved when it strikes. > > To that end, I would like to ask for your assistance with the following > questions and tasks: > > * We need to determine why we encounter low-memory and out-of-memory > situations more frequently than in previous releases. > > - This means that we need to measure how our memory consumption > profile has changed since our previous releases. > > (cscott observes that we were unable to attack the F-9 image size > issues until we were able to quantify the effect of changes we had > made or were considering making. Consequently, he suggests that we > will be unable to attack our current space consumption problems > until we are able to generate good numbers (and displays).) > > - We need to think carefully about (or measure) whether our > memory-consumption patterns have changed. I am particularly > skeptical of our widespread use of tmpfsen since the pages consumed > by files stored on tmpfsen are permanently dirty (and are perhaps > accounted for differently than pages mapped into process' address > spaces?) > > - We need to check the configuration of applications like Browse > which have configurable caching behavior. (Search for "cache" or > "capacity" in about:config; check for important compile-time > configuration flags.) > > - We need to test in a variety of different network configurations > in order to determine to what extent the network/presence > environment affects memory consumption. > > * We need to check carefully for memory-leaks. Three mechanisms which > occur to me include: > > 1) running the system for a period of time, then scanning for > anomalies either manually or in some automated fashion from > userland, kernel-land, or OFW (via SysRq or SMM). > > 2) setting rlimits various processes and noting what dies > > 3) using debugging tools like the python garbage collection > module, guppy/heapy, gdb+macros, valgrind, efence, purify, etc. > looking for trouble. > > * We need to find out why the oom-killer is not killing things fast > enough. Based on our results, we might consider configuring > /proc/$pid/oom_adj to preferentially kill some processes (e.g., the > foreground [or background?] activities.) > > * We need to determine whether the oom-killer is killing the right > processes. (sysctl's vm.oom_dump_tasks can be set to 1 in order to > get more verbosity from the oom-killer when it fires). > > * We ought to ponder whether there are any additional "dirty hacks" we > can experiment with in order to reduce memory consumption; for > example, running the Shell and Journal (and DS?) in one process or > making use of the compressed-caching code published on this list some > months ago. > > * Random other stuff to think about: > > - rlimits, cgroups, and the memory resource controller > > - the warnings in the ramfs and tmpfs code about the deadlocks that > tmpfsen can generate under low- or no-memory conditions. > > - whether our kernel "overcommits" when allocation requests are made? > > - whether we can get Browse to behave intelligently when it receives > BadAlloc errors from X? > > - how to run bootchart on the XO > > - how to generate decent statistics and graphics (preferably in an > automated fashion) concerning memory usage as part of our test > suite > > - system-tap's kmalloc2.stp example > > In conclusion, more to come once I have some actual data; _please_ feel > free to assist in collecting it! (though be aware that I may 'volunteer' > you if I need your help. (That means you, Tomeu, Riccardo, Deepak, > ...)). > > Regards, > > Michael
There are some (trivial) tools (you may be interested in) I've written and used besides others to attack/study this issues: * picker [1] For me it was handier to use then bootchart; will also show per process mem usage. * imports timings and alloc statistics [2] Patch to python that prints timings and mem usage diffs for every imported module. Original timings patch is from Tomeu. * python-allocstatsmodule [3] Inspired by [2] but can be used inside python scripts to collect stats on heap usage. ! When using `allocstats' to get modules mem usage by wrapping import statements you will get quite rough/unuseful values because of import cycles (at least for most interesting modules ;P). Example app at http://dev.laptop.org/~rlucchese/utils/python_mods_import_stats.py Note that [2] and [3] should be better used with a python built with --without-pymalloc. We measured that there are quite big memory savings by using the preload&fork trick (as expected btw). I guess enabling this for `all' python processes would have a good (mem saving)/(work hours) ratio. thanks, riccardo [1] git://dev.laptop.org/activities/picker [2] http://dev.laptop.org/~rlucchese/patches/python_show_mem_stats_on_module_loading.patch [3] git://dev.laptop.org/users/rlucchese/python-allocstatsmodule/.git _______________________________________________ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel