sorry once again---once again wrong account. On Thu, Oct 1, 2009 at 12:36 AM, Anton Muhin <[email protected]> wrote: > On Wed, Sep 30, 2009 at 10:22 PM, James Robinson <[email protected]> wrote: >> >> >> On Wed, Sep 30, 2009 at 11:17 AM, Mike Belshe <[email protected]> wrote: >>> >>> On Wed, Sep 30, 2009 at 11:05 AM, Anton Muhin <[email protected]> wrote: >>>> >>>> On Wed, Sep 30, 2009 at 9:58 PM, Mike Belshe <[email protected]> wrote: >>>> > On Wed, Sep 30, 2009 at 10:48 AM, Anton Muhin <[email protected]> >>>> > wrote: >>>> >> >>>> >> On Wed, Sep 30, 2009 at 9:39 PM, Jim Roskind <[email protected]> wrote: >>>> >> > If you're not interested in TCMalloc customization for Chromium, you >>>> >> > should >>>> >> > stop reading now. >>>> >> > This post is meant to gather some discussion on a topic before I >>>> >> > code >>>> >> > and >>>> >> > land a change. >>>> >> > MOTIVATION >>>> >> > We believe poor memory utilization is at the heart of a lot of jank >>>> >> > problems. Such problems may be difficult to repro in short >>>> >> > controlled >>>> >> > benchmarks, but our users are telling us we have problems, so we >>>> >> > know we >>>> >> > have problems. As a result, we need to be more conservative in >>>> >> > memory >>>> >> > utilization and handling. >>>> >> > SUMMARY OF CHANGE >>>> >> > I'm thinking of changing our TCMalloc so that when a span is freed >>>> >> > into >>>> >> > TCMalloc's free list, and it gets coalesced with an adjacent span >>>> >> > that >>>> >> > is >>>> >> > already decommitted, that the coalesced span should be entirely >>>> >> > decommitted >>>> >> > (as opposed to our current customized performance of committing the >>>> >> > entire >>>> >> > span). >>>> >> > This proposed policy was put in place previously by Mike, but >>>> >> > (reportedly) >>>> >> > caused a 3-5% perf regression in V8. I believe AntonM changed that >>>> >> > policy >>>> >> > to what we have currently, where always ensure full commitment of a >>>> >> > coalesced span (regaining V8 performance on a benchmark). >>>> >> >>>> >> The immediate question and plea. Question: how can we estimate >>>> >> performance implications of the change? Yes, we have some internal >>>> >> benchmarks which could be used for that (they release memory heavily). >>>> >> Anything else? >>>> >> >>>> >> Plea: please, do not regress DOM performance unless there are really >>>> >> compelling reasons. And even in this case :) >>>> > >>>> > Anton - >>>> > All evidence from user complaints and bug reports are that Chrome uses >>>> > too >>>> > much memory. If you load Chrome on a 1GB system, you can feel it >>>> > yourself. >>>> > Unfortunately, we have yet to build a reliable swapping benchmark. By >>>> > allowing tcmalloc to accumulate large chunks of unused pages, we >>>> > increase >>>> > the chance that paging will occur on the system. But because paging is >>>> > a >>>> > system-wide activity, it can hit our various processes in unpredictable >>>> > ways >>>> > - and this leads to jank. I think the jank is worse than the benchmark >>>> > win. >>>> > I wish we had a better way to quantify the damage caused by paging. >>>> > Jim and >>>> > others are working on that. >>>> > But it's clear to me that we're just being a memory pig for what is >>>> > really a >>>> > modest gain on a semi-obscure benchmark right now. Using the current >>>> > algorithms, we have literally multi-hundred megabyte memory usage >>>> > swings in >>>> > exchange for 3% on a benchmark. Don't you agree this is the wrong >>>> > tradeoff? >>>> > (DOM benchmark grows to 500+MB right now; when you switch tabs it >>>> > drops to >>>> > <100MB). Other pages have been witnessed which have similar behavior >>>> > (loading the histograms page). >>>> > We may be able to put in some algorithms which are more aware of the >>>> > current >>>> > available memory going forward, but I agree with Jim that there will be >>>> > a >>>> > lot of negative effects as long as we continue to have such large >>>> > memory >>>> > swings. >>>> >>>> Mike, I am completely agree that we should reduce memory usage. On >>>> the other hand speed was always one of Chrome trademarks. My feeling >>>> is more committed pages in free list make us faster (but yes, there is >>>> paging etc.). That's exactly the reason I asked for some way to >>>> quantify quality of different approaches, esp. given classic memory >>>> vs. speed dilemma, ideally (imho) both speed and memory usage should >>>> be considered. >>> >>> The team is working on benchmarks. >>> I think the evidence of paging is pretty overwhelming. >>> Paging and jank is far worse than the small perf boost on dom node >>> creation. I don't believe the benchmark in question is a significant driver >>> of primary performance. Do you? >> >> I agree completely that this seems to be an issue Here's what about:tcmalloc >> says about my browser process right now (which is at around 267MB according >> to the app's Task Manager): >> >> MALLOC: 207097856 ( 197.5 MB) Heap size >> MALLOC: 12494760 ( 11.9 MB) Bytes in use by application >> MALLOC: 188563456 ( 179.8 MB) Bytes free in page heap >> >> Seems like just a little bit too much memory is being committed for 12MB of >> live objects. >> It might be possible to have it both ways by keeping a small 'buffer' of >> committed pages rather than always committing or decommitting everything in >> the free lists. If we kept a limited size buffer around of committed pages >> just for 'new' allocations but tried to decommit everything past the buffer >> it should be possible to keep allocations fast without blowing through tons >> of memory. I'm going to try to experiment with this a bit and see if it >> looks promising. > > That is one of several heuristics implemented in JSC's version of > tcmalloc (see http://trac.webkit.org/changeset/46511 which was > massaged and brought into tcmalloc). I was originally thinking that > IdleHandler is invoked when thread is idle which would allow to > decommit committed pages, then I learned it false. > > I'd be curious to see if periodic (JSC's style) or idle scavenging > could satisfy both allocation peaks and lowered memory usage. > > yours, > anton. > >> - James >> >>> >>> Mike >>> >>>> >>>> yours, >>>> anton. >>>> >>>> > Mike >>>> > >>>> > >>>> > >>>> >> >>>> >> > WHY CHANGE? >>>> >> > The problematic scenario I'm anticipating (and may currently be >>>> >> > burning >>>> >> > us) >>>> >> > is: >>>> >> > a) A (renderer) process allocates a lot of memory, and achieves a >>>> >> > significant high water mark of memory used. >>>> >> > b) The process deallocates a lot of memory, and it flows into the >>>> >> > TCMalloc >>>> >> > free list. [We still have a lot of memory attributed to that >>>> >> > process, >>>> >> > and >>>> >> > the app as a whole shows as using that memory.] >>>> >> > c) We eventually decide to decommit a lot of our free memory. >>>> >> > Currently >>>> >> > this happens when we switch away from a tab. [This saves us from >>>> >> > further >>>> >> > swapping out the unused memory]. >>>> >> > Now comes the evil problem. >>>> >> > d) We return to the tab which has a giant free list of spans, most >>>> >> > of >>>> >> > which >>>> >> > are decommitted. [The good news is that the memory is still >>>> >> > decommitted] >>>> >> > e) We allocate a block of memory, such as 32k chunk. This memory >>>> >> > is >>>> >> > pulled >>>> >> > from a decommitted span, and ONLY the allocated chunk is committed. >>>> >> > [That >>>> >> > sounds good] >>>> >> > f) We free the block of memory from (e). What ever span is adjacent >>>> >> > to >>>> >> > that >>>> >> > block is committed <potential oops>. Hence, if we he took (e) from >>>> >> > a >>>> >> > 200Meg >>>> >> > span, the act of freeing (e) will cause a 200Meg commitment!?! This >>>> >> > in >>>> >> > turn >>>> >> > would not only require touching (and having VirtualAlloc clear to >>>> >> > zero) >>>> >> > all >>>> >> > allocated memory in the large span, it will also immediately put >>>> >> > memory >>>> >> > pressure on the OS, and force as much as 200Megs of other apps to be >>>> >> > swapped >>>> >> > out to disk :-(. >>>> >> >>>> >> I'm not sure about swapping unless you touch those now committed >>>> >> pages, but only experiment will tell. >>>> >> >>>> >> > I'm wary that our recent fix that allows spans to be (correctly) >>>> >> > coalesced >>>> >> > independent of their size should cause it to be easier to coalesce >>>> >> > spans. >>>> >> > Worse yet, as we proceed to further optimize TCMalloc, one measure >>>> >> > of >>>> >> > success will be that the list of spans will be fragmented less and >>>> >> > less, >>>> >> > and >>>> >> > we'll have larger and larger coalesced singular spans. Any large >>>> >> > "reserved" >>>> >> > but not "commited" span will be a jank time-bomb waiting to blow up >>>> >> > if >>>> >> > the >>>> >> > process every allocates/frees from such a large span :-(. >>>> >> > >>>> >> > WHAT IS THE PLAN GOING FORWARD (or how can we do better, and regain >>>> >> > performance, etc.) >>>> >> > We have at least the following plausible alternative ways to move >>>> >> > forward >>>> >> > with TCMalloc. The overall goal is to avoid wasteful decommits, and >>>> >> > at >>>> >> > the >>>> >> > same time avoid heap-wide flailing between minimal and maximal span >>>> >> > commitment states. >>>> >> > Each free-span is currently the maximal contiguous region of memory >>>> >> > that >>>> >> > TCMalloc is controlling, but has been deallocated. Currently spans >>>> >> > have >>>> >> > to >>>> >> > be totally committed, or totally decommitted. There is no mixture >>>> >> > supported. >>>> >> > a) We could re-architect the span handling to allow spans to be >>>> >> > combinations >>>> >> > of committed and decommitted regions. >>>> >> > b) We could vary out policy on what to do with a coalesced span, >>>> >> > based >>>> >> > on >>>> >> > span size and memory pressure. For example: We can consistently >>>> >> > monitor >>>> >> > the >>>> >> > in-use vs free (but committed) ratio. We can try to stay in some >>>> >> > "acceptable" region by varying our policy. >>>> >> > c) We could actually return to the OS some portions of spans that we >>>> >> > have >>>> >> > decommitted. We could then let the OS give us back these regions if >>>> >> > we >>>> >> > need >>>> >> > memory. Until we get them back, we would not be at risk of doing >>>> >> > unnecessary commits. Decisions about when to return to the OS can >>>> >> > be >>>> >> > made >>>> >> > based on span size and memory pressure. >>>> >> > d) We can change the interval and forcing function for decommitting >>>> >> > spans >>>> >> > that are in our free list. >>>> >> > In each of the above cases, we need benchmark data on user-class >>>> >> > machines to >>>> >> > show costs of these changes. Until we understand the memory impact, >>>> >> > we >>>> >> > need >>>> >> > to move forward conservatively in our action, and be vigilant for >>>> >> > thrashing >>>> >> > scenarios. >>>> >> > >>>> >> > Comments?? >>>> >> >>>> >> As a close attempt you may have a look at >>>> >> http://codereview.chromium.org/256013/show >>>> >> >>>> >> That allows spans with a mix of committed/decommitted pages (but only >>>> >> in returned list) as committing seems to live fine if some pages are >>>> >> already committed. >>>> >> >>>> >> That has some minor performance benefit, but I didn't investigate it >>>> >> in details yet. >>>> >> >>>> >> just my 2 cents, >>>> >> anton. >>>> > >>>> > >>> >> >> >
--~--~---------~--~----~------------~-------~--~----~ Chromium Developers mailing list: [email protected] View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~----------~----~----~----~------~----~------~--~---
