On Wed, Sep 30, 2009 at 11:24 AM, Anton Muhin <[email protected]> wrote:
> On Wed, Sep 30, 2009 at 10:17 PM, Mike Belshe <[email protected]> wrote: > > On Wed, Sep 30, 2009 at 11:05 AM, Anton Muhin <[email protected]> > wrote: > >> > >> On Wed, Sep 30, 2009 at 9:58 PM, Mike Belshe <[email protected]> > wrote: > >> > On Wed, Sep 30, 2009 at 10:48 AM, Anton Muhin <[email protected]> > wrote: > >> >> > >> >> On Wed, Sep 30, 2009 at 9:39 PM, Jim Roskind <[email protected]> wrote: > >> >> > If you're not interested in TCMalloc customization for Chromium, > you > >> >> > should > >> >> > stop reading now. > >> >> > This post is meant to gather some discussion on a topic before I > code > >> >> > and > >> >> > land a change. > >> >> > MOTIVATION > >> >> > We believe poor memory utilization is at the heart of a lot of jank > >> >> > problems. Such problems may be difficult to repro in short > >> >> > controlled > >> >> > benchmarks, but our users are telling us we have problems, so we > know > >> >> > we > >> >> > have problems. As a result, we need to be more conservative in > >> >> > memory > >> >> > utilization and handling. > >> >> > SUMMARY OF CHANGE > >> >> > I'm thinking of changing our TCMalloc so that when a span is freed > >> >> > into > >> >> > TCMalloc's free list, and it gets coalesced with an adjacent span > >> >> > that > >> >> > is > >> >> > already decommitted, that the coalesced span should be entirely > >> >> > decommitted > >> >> > (as opposed to our current customized performance of committing the > >> >> > entire > >> >> > span). > >> >> > This proposed policy was put in place previously by Mike, but > >> >> > (reportedly) > >> >> > caused a 3-5% perf regression in V8. I believe AntonM changed that > >> >> > policy > >> >> > to what we have currently, where always ensure full commitment of a > >> >> > coalesced span (regaining V8 performance on a benchmark). > >> >> > >> >> The immediate question and plea. Question: how can we estimate > >> >> performance implications of the change? Yes, we have some internal > >> >> benchmarks which could be used for that (they release memory > heavily). > >> >> Anything else? > >> >> > >> >> Plea: please, do not regress DOM performance unless there are really > >> >> compelling reasons. And even in this case :) > >> > > >> > Anton - > >> > All evidence from user complaints and bug reports are that Chrome uses > >> > too > >> > much memory. If you load Chrome on a 1GB system, you can feel it > >> > yourself. > >> > Unfortunately, we have yet to build a reliable swapping benchmark. > By > >> > allowing tcmalloc to accumulate large chunks of unused pages, we > >> > increase > >> > the chance that paging will occur on the system. But because paging > is > >> > a > >> > system-wide activity, it can hit our various processes in > unpredictable > >> > ways > >> > - and this leads to jank. I think the jank is worse than the > benchmark > >> > win. > >> > I wish we had a better way to quantify the damage caused by paging. > Jim > >> > and > >> > others are working on that. > >> > But it's clear to me that we're just being a memory pig for what is > >> > really a > >> > modest gain on a semi-obscure benchmark right now. Using the current > >> > algorithms, we have literally multi-hundred megabyte memory usage > swings > >> > in > >> > exchange for 3% on a benchmark. Don't you agree this is the wrong > >> > tradeoff? > >> > (DOM benchmark grows to 500+MB right now; when you switch tabs it > drops > >> > to > >> > <100MB). Other pages have been witnessed which have similar behavior > >> > (loading the histograms page). > >> > We may be able to put in some algorithms which are more aware of the > >> > current > >> > available memory going forward, but I agree with Jim that there will > be > >> > a > >> > lot of negative effects as long as we continue to have such large > memory > >> > swings. > >> > >> Mike, I am completely agree that we should reduce memory usage. On > >> the other hand speed was always one of Chrome trademarks. My feeling > >> is more committed pages in free list make us faster (but yes, there is > >> paging etc.). That's exactly the reason I asked for some way to > >> quantify quality of different approaches, esp. given classic memory > >> vs. speed dilemma, ideally (imho) both speed and memory usage should > >> be considered. > > > > The team is working on benchmarks. > > I think the evidence of paging is pretty overwhelming. > > Paging and jank is far worse than the small perf boost on dom node > creation. > > I don't believe the benchmark in question is a significant driver of > > primary performance. Do you? > > To some extent. Just to make it clear: I am not insisting, if > consensus is we should trade performance in DOM for reduced memory > usage in this case, that's fine. I only want to have real numbers > before we make any decision. > > @pkasting: it wasn't 3%, it was (closer to 8% if memory serves). > When I checked it in my records show a 217 -> 210 benchmark drop, which is 3%. > And forgotten. Regarding the policy to decommit spans in ::Delete. > Please, correct me if I'm wrong, but doesn't that actually would make > all the free spans decommitted---the span would be only committed when > it gets allocated, no? Decommitting only if any of adjacent spans is > decommitted may keep some spans committed, but it's difficult for me > to say how often. > Oh - more work is still needed, yes :-) Mike > > yours, > anton. > > > Mike > > > >> > >> yours, > >> anton. > >> > >> > Mike > >> > > >> > > >> > > >> >> > >> >> > WHY CHANGE? > >> >> > The problematic scenario I'm anticipating (and may currently be > >> >> > burning > >> >> > us) > >> >> > is: > >> >> > a) A (renderer) process allocates a lot of memory, and achieves a > >> >> > significant high water mark of memory used. > >> >> > b) The process deallocates a lot of memory, and it flows into the > >> >> > TCMalloc > >> >> > free list. [We still have a lot of memory attributed to that > process, > >> >> > and > >> >> > the app as a whole shows as using that memory.] > >> >> > c) We eventually decide to decommit a lot of our free memory. > >> >> > Currently > >> >> > this happens when we switch away from a tab. [This saves us from > >> >> > further > >> >> > swapping out the unused memory]. > >> >> > Now comes the evil problem. > >> >> > d) We return to the tab which has a giant free list of spans, most > of > >> >> > which > >> >> > are decommitted. [The good news is that the memory is still > >> >> > decommitted] > >> >> > e) We allocate a block of memory, such as 32k chunk. This memory > is > >> >> > pulled > >> >> > from a decommitted span, and ONLY the allocated chunk is committed. > >> >> > [That > >> >> > sounds good] > >> >> > f) We free the block of memory from (e). What ever span is > adjacent > >> >> > to > >> >> > that > >> >> > block is committed <potential oops>. Hence, if we he took (e) from > a > >> >> > 200Meg > >> >> > span, the act of freeing (e) will cause a 200Meg commitment!?! > This > >> >> > in > >> >> > turn > >> >> > would not only require touching (and having VirtualAlloc clear to > >> >> > zero) > >> >> > all > >> >> > allocated memory in the large span, it will also immediately put > >> >> > memory > >> >> > pressure on the OS, and force as much as 200Megs of other apps to > be > >> >> > swapped > >> >> > out to disk :-(. > >> >> > >> >> I'm not sure about swapping unless you touch those now committed > >> >> pages, but only experiment will tell. > >> >> > >> >> > I'm wary that our recent fix that allows spans to be (correctly) > >> >> > coalesced > >> >> > independent of their size should cause it to be easier to coalesce > >> >> > spans. > >> >> > Worse yet, as we proceed to further optimize TCMalloc, one measure > >> >> > of > >> >> > success will be that the list of spans will be fragmented less and > >> >> > less, > >> >> > and > >> >> > we'll have larger and larger coalesced singular spans. Any large > >> >> > "reserved" > >> >> > but not "commited" span will be a jank time-bomb waiting to blow up > >> >> > if > >> >> > the > >> >> > process every allocates/frees from such a large span :-(. > >> >> > > >> >> > WHAT IS THE PLAN GOING FORWARD (or how can we do better, and regain > >> >> > performance, etc.) > >> >> > We have at least the following plausible alternative ways to move > >> >> > forward > >> >> > with TCMalloc. The overall goal is to avoid wasteful decommits, > and > >> >> > at > >> >> > the > >> >> > same time avoid heap-wide flailing between minimal and maximal span > >> >> > commitment states. > >> >> > Each free-span is currently the maximal contiguous region of memory > >> >> > that > >> >> > TCMalloc is controlling, but has been deallocated. Currently spans > >> >> > have > >> >> > to > >> >> > be totally committed, or totally decommitted. There is no mixture > >> >> > supported. > >> >> > a) We could re-architect the span handling to allow spans to be > >> >> > combinations > >> >> > of committed and decommitted regions. > >> >> > b) We could vary out policy on what to do with a coalesced span, > >> >> > based > >> >> > on > >> >> > span size and memory pressure. For example: We can consistently > >> >> > monitor > >> >> > the > >> >> > in-use vs free (but committed) ratio. We can try to stay in some > >> >> > "acceptable" region by varying our policy. > >> >> > c) We could actually return to the OS some portions of spans that > we > >> >> > have > >> >> > decommitted. We could then let the OS give us back these regions > if > >> >> > we > >> >> > need > >> >> > memory. Until we get them back, we would not be at risk of doing > >> >> > unnecessary commits. Decisions about when to return to the OS can > be > >> >> > made > >> >> > based on span size and memory pressure. > >> >> > d) We can change the interval and forcing function for decommitting > >> >> > spans > >> >> > that are in our free list. > >> >> > In each of the above cases, we need benchmark data on user-class > >> >> > machines to > >> >> > show costs of these changes. Until we understand the memory > impact, > >> >> > we > >> >> > need > >> >> > to move forward conservatively in our action, and be vigilant for > >> >> > thrashing > >> >> > scenarios. > >> >> > > >> >> > Comments?? > >> >> > >> >> As a close attempt you may have a look at > >> >> http://codereview.chromium.org/256013/show > >> >> > >> >> That allows spans with a mix of committed/decommitted pages (but only > >> >> in returned list) as committing seems to live fine if some pages are > >> >> already committed. > >> >> > >> >> That has some minor performance benefit, but I didn't investigate it > >> >> in details yet. > >> >> > >> >> just my 2 cents, > >> >> anton. > >> > > >> > > > > > > --~--~---------~--~----~------------~-------~--~----~ Chromium Developers mailing list: [email protected] View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~----------~----~----~----~------~----~------~--~---
