[chromium-dev] Re: [cros-dev] Proposed OOM improvements.

Luigi Semenzato Wed, 04 Aug 2010 18:33:54 -0700

I suspect there is one issue you may want to consider even before you
get to the ones you mention.  We've had reports of "extreme slowness",
and I was able to reproduce such situation in the past.  The slowness
(and pegged disk activity) is consistent with thrashing due to code
paging.  Even though we don't use swap, the kernel will still reclaim
read-only executable pages since they have a backing store (the
executable file).  I suspect this may make the system unusable before
you get into an actual OOM situation.


Other than that, this seems like a good plan.  The "low-on-memory" UI
is something that is sorely missing from existing systems.


On Wed, Aug 4, 2010 at 3:06 PM, Greg Spencer <gspen...@chromium.org> wrote:
> Hi Folks,
>
> Here's my proposal for improving OOM situations on ChromeOS. In a nutshell,
> the idea is that we'll tune the OOM killer's algorithm to match what we
> want, and make the UI more explicit about what happened when a tab is killed
> by the OOM killer.
> Please let me know if you have any suggestions/comments.
> -Greg.
>
> Document link:
> https://docs.google.com/a/google.com/document/edit?id=1ddPY1-v7ZFr0jmhuxw04ehNLMQrPzxHhL6vUoBzELBo&hl=en
>
> (but that probably won't work outside google.com: see below for full text).
>
> -------------------
>
> Out of Memory Management for ChromeOS
>
> Greg Spencer (gspencer), ChromeOS UI team.
>
> Intro
>
> Like all computers, ChromeOS devices have limited memory, and bad things
> happen when we run out of physical memory.  We’d like to make ChromeOS be
> more elegant than most OSs when it runs into this situation.  To that end,
> we’re looking to improve the user experience around out of memory (OOM)
> conditions.
>
> Current State
>
> Currently, when a ChromeOS device runs out of memory processes are killed by
> the OOM killer (a part of the kernel) until enough memory is available.
>  Because we have no swap configured, but do allow overcommit (i.e. malloc
> pretends it has nearly unlimited memory when handing out addresses),
> eventually a process tries to use memory assigned to it in the virtual
> address space that isn’t actually available, and the kernel asks the OOM
> killer to kill processes on the system until enough memory is available.
>  The processes killed don’t necessarily include the one that started to use
> the unavailable memory, but rather are based on the OOM killer’s “badness”
> algorithm.
>
> We don’t have a swap partition configured because we’re afraid that it will
> start killing blocks on the SSD after an unreasonably short time.  [I
> haven’t verified that this is indeed a problem, but I’m assuming that the
> original decision wasn’t made in a vacuum.  It does seem to me that write
> levelling in the SSD hardware should mitigate this somewhat, however].
>
> Currently, the renderers, plugins, and browser processes are (not
> surprisingly) the largest users of memory on Chrome OS.  The renderers and
> plugins can be killed without crashing the system, but killing the browser
> process (which can grow quite large) causes the entire session to restart,
> so we want to kill that as a last resort (or at least after all the
> renderers and plugins have been killed).
>
> Linux Chrome already uses the /proc/<PID>/oom_adj method for proiritizing
> renderers and plugins over the browser process (or system processes) for
> being killed by the OOM so they’ll be killed in the right order.  This works
> fairly well, but the default OOM killer algorithm prefers to kill recent
> processes instead of older processes, so this is not quite optimal for us,
> as we would prefer to kill older tabs over newer ones, and non-pinned tabs
> before pinned ones.  [The file /proc/<PID>/oom_adj contains a bit-shift
> value from -17 to 15 that adjust the badness value of the process.]
>
> Also, when things are killed, the “Sad Tab” page is displayed, which doesn’t
> communicate the nature of the failure.
>
> Possible Methods of Controlling Memory Usage on ChromeOS
>
> [This is the brainstorming part of the doc.  Not all of these will be
> implemented.]
>
> Kernel Level
>
> Change overcommit behavior (change to "overcommit_ratio”), to encourage more
> NULLs being returned from malloc instead of the OOM getting happy and
> killing stuff randomly.   This might not actually help things -- it'll mean
> that the process that is trying to allocate always gets killed via segfault
> instead of another less important process.
> Use mem_notify kernel module to send notification when thresholds are
> reached if we aren't already.   This is useful for clearning caches, garbage
> collecting, etc, but isn’t a solution to the overall problem. This may be
> useful for marking which tabs are killed from the OOM and which are killed
> for other reasons.
> Severely re-nice or stop processes that abuse memory in order to have
> resources to let user pick what to do. (but it is not be possible for it to
> happen fast enough in all cases).
> Setup some small swap space (e.g. 50M) so that any very static data in
> memory gets swapped out.  We currently have at least 25M of data that never
> gets accessed again once the app is loaded.
>
> Chrome Level Changes
>
> Respond to mem_notify events (in order of how draconian they are) with
> actions that don’t require user notification.  This is by its nature a
> bandaid, as any memory sponge will quickly eat up the freed memory:
>
> Flushing memory HTML caches
> Garbage collecting all V8, Crore, Flash instances.
> Sharing renderers among more tabs, killing some renderers.  [Darin says this
> probably won’t gain us much -- only means we can share a few more font
> tables, etc, and will slow things down considerably due to swapping out
> DOMs, etc.]
> Empty Flash and HTML5 audio/video buffers (and maybe notify user because
> they’ll all start rebuffering if they are playing).
>
> Other measures that may require user notification:
>
> Closing no-content tabs (new tab pages, about: pages).
> Closing windows that only have non-content tabs in them (e.g. an empty
> window running with just the new tab page).
>
> Reduce memory usage in the first place by:
>
> mmap’ing large images (which would get swapped out on low memory by the
> kernel).  [We may already be doing this]
>
> Implementation Plan
>
> Given that some of the suggestions above require more work than others, I’m
> planning to pick the low hanging items first, and then see how much bang
> that gives us, and then move on to more time consuming mitigation if that’s
> not sufficient.
>
> Phase 1 -- Tune OOM killer algorithm
>
> I'm going to collect the following information:
>
> Whether or not a tab is pinned
> When was the last time the user clicked on or entered something into the tab
> When was the last time the user clicked on the tab to make it current
> How much memory the tab is using
>
> And then I'm going to come up with an algorithm (TBD) that ranks tabs based
> on these criterion.  The algorithm will probably prefer to kill tabs that
> aren’t pinned, have been idle for the longest, and use the most memory.
>  It’ll probably kill plugins before killing renderers.
>
> I'm going to write a manager into the browser process that every so often
> (every five seconds or so) adjusts the oom_adj value of all the renderers
> and plugins to sort them based on the algorithm above.  I will probably only
> need to adjust them a little -- the current renderer and plugin processes
> get an adjustment of five (which shifts the badness up by five bits).  I'll
> probably just have three to five different levels of badness to assign,
> starting at five (where larger is more likely to be killed).
>
> I'm going to change the UI so that the when a tab is killed by the OOM, it
> displays a page different from the “Sad Tab” page that tells the user what
> happened and why, and gives them the option to reload the page.  This may be
> a little tricky to determine, as there really isn’t a lot of warning when
> the OOM kills your process.
>
> It has been suggested that we just let the OOM killer kill a tab, mark it,
> and just reload it the next time the user visits it.  We can test this, but
> my feeling is that the user will occasionally be very surprised to find that
> this happens, that some web apps will handle this poorly, and that losing
> user data on reload is something we need to explicitly notify the user
> about.  It seems to me that if we can’t guarantee full reload (save DOM
> state, javascript variable state, plugin state, etc.), that this is a shabby
> thing to do: it’s cleaner to tell them why we killed it and let them decide
> if they want to chance reloading it.
>
> As I’m implementing this, I’ll write a test that will exercise the OOM
> killer algorithm.  Hopefully that’s not too tricky to get into our testing
> framework without being flaky.
>
> Phase 1.1 -- Add in Networking info to OOM killer tuning.
>
> Collecting the last time a tab accessed the network is complicated to
> implement (e.g. sandboxed network access happens in another process and so
> has to be tracked back to a renderer), so I’ll implement that only if we
> think it’ll help with tuning.  The main idea here is that music streaming
> apps might be likely to be killed based on the other criterion, so this
> helps recognize tabs that are streaming in the background.  The fallback is
> to have the user pin streaming tabs.
>
> Phase 2 -- Notify user when memory is getting low
>
> In this phase we post some kind of notification when we get a mem_notify
> event that we’re low on memory.  At that point, we can ask the user to kill
> off memory intensive applications.  This will require a UI similar to the
> task manager (it might even be the task manager) so that the user can make
> informed choices about what to kill.  In order to be able to display this UI
> when the memory is low, we’ll have to pre-allocate it and keep it around
> until needed.
>
> This feels like a pretty heavy UI, and I’m not sure all users will feel
> qualified to decide what to kill.  Maybe just give them a choice of the top
> five candidates for killing?
>
> Phase 3 -- Flush all caches on mem_notify events
>
> In this phase we try and flush all available caches in the OS -- plugins,
> browsers, etc. when we get our first mem_notify event that we’re out of
> memory.  This step seems like a bandaid -- it’s only going to help the first
> time it happens, and thereafter there will bealmost nothing freed until the
> caches have time to refill.  This would, however, be good in combination
> with user notification that there is too much memory being used, since it
> may buy the user some time to manage their tabs.
>
> --
> Chromium OS Developers mailing list: chromium-os-...@chromium.org
> View archives, change email options, or unsubscribe:
> http://groups.google.com/a/chromium.org/group/chromium-os-dev?hl=en
>

-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev

[chromium-dev] Re: [cros-dev] Proposed OOM improvements.

Reply via email to