There are four classes of things we can/should/could do: 1) understand where our memory is being used. Individual bugs can have a large effect. Something stupid could be hurting us badly, and we won't know unless we look. What is more, we need to invest in tools that allow us to monitor this. 2) there are some band-aids that have been discussed, such as rlimits, which we can experiment with, and that *might* improve the situation without the real solutions the next two items go into. 3) the oom killer's default algorithms are pretty terrible, taking little into account in the choice of what gets killed. Between Sugar/Rainbow, and knowledge that the window manager has, one could do much better. 4) we provide no end user feedback on memory usage, either. We should investigate whether revisiting our previous attempt to give such feedback, now that Linux can provide much better information than it could when we abandoned our previous donut attempt. The users could really help, if only we let them know a bit about what was going on...
In terms of priority: immediately examining what is going on with memory usage in case we have a bad leak is clearly worthwhile (1). We need to budget for tool-building to monitor the situation going forward immediately. 2) and *possibly* (a beginning on) 3 may be 8.2.1 fodder, but without feedback from more users, we won't know if this isn't just keys under the lamppost (e.g. our multiple bug reports about browse ooming because of our amazingly stupid hardware wiki page, which is one of the most egregious pages I've seen in recent memory. Doing 3) pretty well I suspect is 9.1 fodder, but only if we start very soon. My gut tells me its some man-months of work. We might get lucky and should investigate if any of the embedded folks have something we can use. Unfortunately, the Nokia folks I had thought might have something didn't, when I last checked a year ago. But we can/should check a bit first before diving in; it's a year later. http://dev.laptop.org/ticket/1995 I urge we investigate quickly whether 4) is, in fact, feasible, so that it can go on the Sugar roadmap in time to be done for 9.1. - Jim On Tue, 2008-09-09 at 13:02 +0200, Tomeu Vizoso wrote: > On Tue, Sep 9, 2008 at 6:10 AM, Michael Stone <[EMAIL PROTECTED]> wrote: > > > > * We need to find out why the oom-killer is not killing things fast > > enough. Based on our results, we might consider configuring > > /proc/$pid/oom_adj to preferentially kill some processes (e.g., the > > foreground [or background?] activities.) > > Any reason why killing first activities' processes wouldn't solve the > stability issue? AFAIK, we haven't seen OOM conditions without any > activity open. > > Just in case, I'm not saying that isn't worth to do any of the other > things on your list. > > Regards, > > Tomeu > _______________________________________________ > Devel mailing list > Devel@lists.laptop.org > http://lists.laptop.org/listinfo/devel -- Jim Gettys <[EMAIL PROTECTED]> One Laptop Per Child _______________________________________________ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel