On Wed, Sep 30, 2009 at 11:05 AM, Anton Muhin <[email protected]> wrote:
> On Wed, Sep 30, 2009 at 9:58 PM, Mike Belshe <[email protected]> wrote: > > On Wed, Sep 30, 2009 at 10:48 AM, Anton Muhin <[email protected]> wrote: > >> > >> On Wed, Sep 30, 2009 at 9:39 PM, Jim Roskind <[email protected]> wrote: > >> > If you're not interested in TCMalloc customization for Chromium, you > >> > should > >> > stop reading now. > >> > This post is meant to gather some discussion on a topic before I code > >> > and > >> > land a change. > >> > MOTIVATION > >> > We believe poor memory utilization is at the heart of a lot of jank > >> > problems. Such problems may be difficult to repro in short controlled > >> > benchmarks, but our users are telling us we have problems, so we know > we > >> > have problems. As a result, we need to be more conservative in memory > >> > utilization and handling. > >> > SUMMARY OF CHANGE > >> > I'm thinking of changing our TCMalloc so that when a span is freed > into > >> > TCMalloc's free list, and it gets coalesced with an adjacent span that > >> > is > >> > already decommitted, that the coalesced span should be entirely > >> > decommitted > >> > (as opposed to our current customized performance of committing the > >> > entire > >> > span). > >> > This proposed policy was put in place previously by Mike, but > >> > (reportedly) > >> > caused a 3-5% perf regression in V8. I believe AntonM changed that > >> > policy > >> > to what we have currently, where always ensure full commitment of a > >> > coalesced span (regaining V8 performance on a benchmark). > >> > >> The immediate question and plea. Question: how can we estimate > >> performance implications of the change? Yes, we have some internal > >> benchmarks which could be used for that (they release memory heavily). > >> Anything else? > >> > >> Plea: please, do not regress DOM performance unless there are really > >> compelling reasons. And even in this case :) > > > > Anton - > > All evidence from user complaints and bug reports are that Chrome uses > too > > much memory. If you load Chrome on a 1GB system, you can feel it > yourself. > > Unfortunately, we have yet to build a reliable swapping benchmark. By > > allowing tcmalloc to accumulate large chunks of unused pages, we increase > > the chance that paging will occur on the system. But because paging is a > > system-wide activity, it can hit our various processes in unpredictable > ways > > - and this leads to jank. I think the jank is worse than the benchmark > win. > > I wish we had a better way to quantify the damage caused by paging. Jim > and > > others are working on that. > > But it's clear to me that we're just being a memory pig for what is > really a > > modest gain on a semi-obscure benchmark right now. Using the current > > algorithms, we have literally multi-hundred megabyte memory usage swings > in > > exchange for 3% on a benchmark. Don't you agree this is the wrong > tradeoff? > > (DOM benchmark grows to 500+MB right now; when you switch tabs it drops > to > > <100MB). Other pages have been witnessed which have similar behavior > > (loading the histograms page). > > We may be able to put in some algorithms which are more aware of the > current > > available memory going forward, but I agree with Jim that there will be a > > lot of negative effects as long as we continue to have such large memory > > swings. > > Mike, I am completely agree that we should reduce memory usage. On > the other hand speed was always one of Chrome trademarks. My feeling > is more committed pages in free list make us faster (but yes, there is > paging etc.). That's exactly the reason I asked for some way to > quantify quality of different approaches, esp. given classic memory > vs. speed dilemma, ideally (imho) both speed and memory usage should > be considered. > The team is working on benchmarks. I think the evidence of paging is pretty overwhelming. Paging and jank is far worse than the small perf boost on dom node creation. I don't believe the benchmark in question is a significant driver of primary performance. Do you? Mike > > yours, > anton. > > > Mike > > > > > > > >> > >> > WHY CHANGE? > >> > The problematic scenario I'm anticipating (and may currently be > burning > >> > us) > >> > is: > >> > a) A (renderer) process allocates a lot of memory, and achieves a > >> > significant high water mark of memory used. > >> > b) The process deallocates a lot of memory, and it flows into the > >> > TCMalloc > >> > free list. [We still have a lot of memory attributed to that process, > >> > and > >> > the app as a whole shows as using that memory.] > >> > c) We eventually decide to decommit a lot of our free memory. > Currently > >> > this happens when we switch away from a tab. [This saves us from > further > >> > swapping out the unused memory]. > >> > Now comes the evil problem. > >> > d) We return to the tab which has a giant free list of spans, most of > >> > which > >> > are decommitted. [The good news is that the memory is still > >> > decommitted] > >> > e) We allocate a block of memory, such as 32k chunk. This memory is > >> > pulled > >> > from a decommitted span, and ONLY the allocated chunk is committed. > >> > [That > >> > sounds good] > >> > f) We free the block of memory from (e). What ever span is adjacent > to > >> > that > >> > block is committed <potential oops>. Hence, if we he took (e) from a > >> > 200Meg > >> > span, the act of freeing (e) will cause a 200Meg commitment!?! This > in > >> > turn > >> > would not only require touching (and having VirtualAlloc clear to > zero) > >> > all > >> > allocated memory in the large span, it will also immediately put > memory > >> > pressure on the OS, and force as much as 200Megs of other apps to be > >> > swapped > >> > out to disk :-(. > >> > >> I'm not sure about swapping unless you touch those now committed > >> pages, but only experiment will tell. > >> > >> > I'm wary that our recent fix that allows spans to be (correctly) > >> > coalesced > >> > independent of their size should cause it to be easier to coalesce > >> > spans. > >> > Worse yet, as we proceed to further optimize TCMalloc, one measure of > >> > success will be that the list of spans will be fragmented less and > less, > >> > and > >> > we'll have larger and larger coalesced singular spans. Any large > >> > "reserved" > >> > but not "commited" span will be a jank time-bomb waiting to blow up if > >> > the > >> > process every allocates/frees from such a large span :-(. > >> > > >> > WHAT IS THE PLAN GOING FORWARD (or how can we do better, and regain > >> > performance, etc.) > >> > We have at least the following plausible alternative ways to move > >> > forward > >> > with TCMalloc. The overall goal is to avoid wasteful decommits, and > at > >> > the > >> > same time avoid heap-wide flailing between minimal and maximal span > >> > commitment states. > >> > Each free-span is currently the maximal contiguous region of memory > that > >> > TCMalloc is controlling, but has been deallocated. Currently spans > have > >> > to > >> > be totally committed, or totally decommitted. There is no mixture > >> > supported. > >> > a) We could re-architect the span handling to allow spans to be > >> > combinations > >> > of committed and decommitted regions. > >> > b) We could vary out policy on what to do with a coalesced span, based > >> > on > >> > span size and memory pressure. For example: We can consistently > monitor > >> > the > >> > in-use vs free (but committed) ratio. We can try to stay in some > >> > "acceptable" region by varying our policy. > >> > c) We could actually return to the OS some portions of spans that we > >> > have > >> > decommitted. We could then let the OS give us back these regions if > we > >> > need > >> > memory. Until we get them back, we would not be at risk of doing > >> > unnecessary commits. Decisions about when to return to the OS can be > >> > made > >> > based on span size and memory pressure. > >> > d) We can change the interval and forcing function for decommitting > >> > spans > >> > that are in our free list. > >> > In each of the above cases, we need benchmark data on user-class > >> > machines to > >> > show costs of these changes. Until we understand the memory impact, > we > >> > need > >> > to move forward conservatively in our action, and be vigilant for > >> > thrashing > >> > scenarios. > >> > > >> > Comments?? > >> > >> As a close attempt you may have a look at > >> http://codereview.chromium.org/256013/show > >> > >> That allows spans with a mix of committed/decommitted pages (but only > >> in returned list) as committing seems to live fine if some pages are > >> already committed. > >> > >> That has some minor performance benefit, but I didn't investigate it > >> in details yet. > >> > >> just my 2 cents, > >> anton. > > > > > --~--~---------~--~----~------------~-------~--~----~ Chromium Developers mailing list: [email protected] View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~----------~----~----~----~------~----~------~--~---
