Guys, just to summarize the discussion. There are several ways we can tweak tcmalloc:
1) decommit everything what is free; 2) keep spans with a mixed state (some pages committed, some not, coalescing nor commit, not decommits)---that should solve main Jim's argument; 3) commit on coalescing, but aggressively purge (like WebKit do, once in 5 secs unless something else has been committed, or in idle pauses. To my knowledge performance-wise 1) is slower (how slower we should learn), 2) is slightly faster than 3) (but it might be just a statistical error). Of course, my benchmark is quite special. Memory-wise I think 2) and 3) with aggressive scavenging should be mostly the same---we could keep higher number of committed pages than in 1), but for short periods of time and I'm not convinced it's a bad thing. Overall I'm pro 2) and 3), but I am definitely biased. What do you think? And many thanks to Vitaly for discussion. yours, anton. On Thu, Oct 1, 2009 at 3:56 AM, James Robinson <[email protected]> wrote: > On Wed, Sep 30, 2009 at 2:28 PM, James Robinson <[email protected]> wrote: >> >> On Wed, Sep 30, 2009 at 11:29 AM, Anton Muhin <[email protected]> wrote: >>> >>> On Wed, Sep 30, 2009 at 10:27 PM, Mike Belshe <[email protected]> wrote: >>> > >>> > >>> > On Wed, Sep 30, 2009 at 11:24 AM, Anton Muhin <[email protected]> >>> > wrote: >>> >> >>> >> On Wed, Sep 30, 2009 at 10:17 PM, Mike Belshe <[email protected]> >>> >> wrote: >>> >> > On Wed, Sep 30, 2009 at 11:05 AM, Anton Muhin <[email protected]> >>> >> > wrote: >>> >> >> >>> >> >> On Wed, Sep 30, 2009 at 9:58 PM, Mike Belshe <[email protected]> >>> >> >> wrote: >>> >> >> > On Wed, Sep 30, 2009 at 10:48 AM, Anton Muhin <[email protected]> >>> >> >> > wrote: >>> >> >> >> >>> >> >> >> On Wed, Sep 30, 2009 at 9:39 PM, Jim Roskind <[email protected]> >>> >> >> >> wrote: >>> >> >> >> > If you're not interested in TCMalloc customization for >>> >> >> >> > Chromium, >>> >> >> >> > you >>> >> >> >> > should >>> >> >> >> > stop reading now. >>> >> >> >> > This post is meant to gather some discussion on a topic before >>> >> >> >> > I >>> >> >> >> > code >>> >> >> >> > and >>> >> >> >> > land a change. >>> >> >> >> > MOTIVATION >>> >> >> >> > We believe poor memory utilization is at the heart of a lot of >>> >> >> >> > jank >>> >> >> >> > problems. Such problems may be difficult to repro in short >>> >> >> >> > controlled >>> >> >> >> > benchmarks, but our users are telling us we have problems, so >>> >> >> >> > we >>> >> >> >> > know >>> >> >> >> > we >>> >> >> >> > have problems. As a result, we need to be more conservative >>> >> >> >> > in >>> >> >> >> > memory >>> >> >> >> > utilization and handling. >>> >> >> >> > SUMMARY OF CHANGE >>> >> >> >> > I'm thinking of changing our TCMalloc so that when a span is >>> >> >> >> > freed >>> >> >> >> > into >>> >> >> >> > TCMalloc's free list, and it gets coalesced with an adjacent >>> >> >> >> > span >>> >> >> >> > that >>> >> >> >> > is >>> >> >> >> > already decommitted, that the coalesced span should be >>> >> >> >> > entirely >>> >> >> >> > decommitted >>> >> >> >> > (as opposed to our current customized performance of >>> >> >> >> > committing >>> >> >> >> > the >>> >> >> >> > entire >>> >> >> >> > span). >>> >> >> >> > This proposed policy was put in place previously by Mike, but >>> >> >> >> > (reportedly) >>> >> >> >> > caused a 3-5% perf regression in V8. I believe AntonM changed >>> >> >> >> > that >>> >> >> >> > policy >>> >> >> >> > to what we have currently, where always ensure full commitment >>> >> >> >> > of >>> >> >> >> > a >>> >> >> >> > coalesced span (regaining V8 performance on a benchmark). >>> >> >> >> >>> >> >> >> The immediate question and plea. Question: how can we estimate >>> >> >> >> performance implications of the change? Yes, we have some >>> >> >> >> internal >>> >> >> >> benchmarks which could be used for that (they release memory >>> >> >> >> heavily). >>> >> >> >> Anything else? >>> >> >> >> >>> >> >> >> Plea: please, do not regress DOM performance unless there are >>> >> >> >> really >>> >> >> >> compelling reasons. And even in this case :) >>> >> >> > >>> >> >> > Anton - >>> >> >> > All evidence from user complaints and bug reports are that Chrome >>> >> >> > uses >>> >> >> > too >>> >> >> > much memory. If you load Chrome on a 1GB system, you can feel it >>> >> >> > yourself. >>> >> >> > Unfortunately, we have yet to build a reliable swapping >>> >> >> > benchmark. >>> >> >> > By >>> >> >> > allowing tcmalloc to accumulate large chunks of unused pages, we >>> >> >> > increase >>> >> >> > the chance that paging will occur on the system. But because >>> >> >> > paging >>> >> >> > is >>> >> >> > a >>> >> >> > system-wide activity, it can hit our various processes in >>> >> >> > unpredictable >>> >> >> > ways >>> >> >> > - and this leads to jank. I think the jank is worse than the >>> >> >> > benchmark >>> >> >> > win. >>> >> >> > I wish we had a better way to quantify the damage caused by >>> >> >> > paging. >>> >> >> > Jim >>> >> >> > and >>> >> >> > others are working on that. >>> >> >> > But it's clear to me that we're just being a memory pig for what >>> >> >> > is >>> >> >> > really a >>> >> >> > modest gain on a semi-obscure benchmark right now. Using the >>> >> >> > current >>> >> >> > algorithms, we have literally multi-hundred megabyte memory usage >>> >> >> > swings >>> >> >> > in >>> >> >> > exchange for 3% on a benchmark. Don't you agree this is the >>> >> >> > wrong >>> >> >> > tradeoff? >>> >> >> > (DOM benchmark grows to 500+MB right now; when you switch tabs >>> >> >> > it >>> >> >> > drops >>> >> >> > to >>> >> >> > <100MB). Other pages have been witnessed which have similar >>> >> >> > behavior >>> >> >> > (loading the histograms page). >>> >> >> > We may be able to put in some algorithms which are more aware of >>> >> >> > the >>> >> >> > current >>> >> >> > available memory going forward, but I agree with Jim that there >>> >> >> > will >>> >> >> > be >>> >> >> > a >>> >> >> > lot of negative effects as long as we continue to have such large >>> >> >> > memory >>> >> >> > swings. >>> >> >> >>> >> >> Mike, I am completely agree that we should reduce memory usage. On >>> >> >> the other hand speed was always one of Chrome trademarks. My >>> >> >> feeling >>> >> >> is more committed pages in free list make us faster (but yes, there >>> >> >> is >>> >> >> paging etc.). That's exactly the reason I asked for some way to >>> >> >> quantify quality of different approaches, esp. given classic memory >>> >> >> vs. speed dilemma, ideally (imho) both speed and memory usage >>> >> >> should >>> >> >> be considered. >>> >> > >>> >> > The team is working on benchmarks. >>> >> > I think the evidence of paging is pretty overwhelming. >>> >> > Paging and jank is far worse than the small perf boost on dom node >>> >> > creation. >>> >> > I don't believe the benchmark in question is a significant driver >>> >> > of >>> >> > primary performance. Do you? >>> >> >>> >> To some extent. Just to make it clear: I am not insisting, if >>> >> consensus is we should trade performance in DOM for reduced memory >>> >> usage in this case, that's fine. I only want to have real numbers >>> >> before we make any decision. >>> >> >>> >> @pkasting: it wasn't 3%, it was (closer to 8% if memory serves). >>> > >>> > When I checked it in my records show a 217 -> 210 benchmark drop, which >>> > is >>> > 3%. >>> >>> My numbers were substantially bigger, but anyway we need to remeasure >>> it---there are too many factors. >> >> I did some measurements on my windows machine between the current behavior >> (always commit spans when merging them together) with a very conservative >> alternative (always decommit spans on ::Delete, including the just released >> one). The interesting bits are the benchmark scores and memory use at the >> end of the run. >> For the DOM benchmark, the score regressed from an average over 4 runs of >> 188.25 to 185 which is <2%. The peak memory is about the same but the >> memory committed by the tab at the end of the run decreased from an average >> of 642MB to 57MB which is a 91% reduction. 4 runs probably isn't enough to >> make a definitive statement about the perf impact but I think the memory >> impact is pretty clear. The memory characteristics of the V8 benchmark was >> unchanged but the performance dropped from an average of 3009 to 2944, which >> is about 2%. Sunspider did not change at all in either memory or >> performance. > > Sorry, disregard those DOM numbers (I wasn't running the right test). > I re-ran on dromaeo's DOM Core test suite twice with and without the > aggressive decommitting and the numbers are: > r23768 unmodified: > scores: 299.36 run/s 302.47 run/s > memory footprint of renderer at end of run: 333,648KB 334,156KB > r23768 with decommitting: > scores: 296.06 run/s 293.88 run/s > memory footprint of renderer at end of run: 91,856KB 68,208KB > I think if the tradeoff is between <2% perf compared to 3-5x memory use it's > better to get more conservative with our memory use first and then figure > out how to earn back the perf impact without blowing the memory use sky-high > again. I think it's pretty clear we don't need all 200MB of extra committed > memory in order to do 3 more runs per second. > - James >> >> - James >>> >>> yours, >>> anton. >>> >>> >> >>> >> And forgotten. Regarding the policy to decommit spans in ::Delete. >>> >> Please, correct me if I'm wrong, but doesn't that actually would make >>> >> all the free spans decommitted---the span would be only committed when >>> >> it gets allocated, no? Decommitting only if any of adjacent spans is >>> >> decommitted may keep some spans committed, but it's difficult for me >>> >> to say how often. >>> > >>> > Oh - more work is still needed, yes :-) >>> > >>> > Mike >>> > >>> >> >>> >> yours, >>> >> anton. >>> >> >>> >> > Mike >>> >> > >>> >> >> >>> >> >> yours, >>> >> >> anton. >>> >> >> >>> >> >> > Mike >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> >> >>> >> >> >> > WHY CHANGE? >>> >> >> >> > The problematic scenario I'm anticipating (and may currently >>> >> >> >> > be >>> >> >> >> > burning >>> >> >> >> > us) >>> >> >> >> > is: >>> >> >> >> > a) A (renderer) process allocates a lot of memory, and >>> >> >> >> > achieves a >>> >> >> >> > significant high water mark of memory used. >>> >> >> >> > b) The process deallocates a lot of memory, and it flows into >>> >> >> >> > the >>> >> >> >> > TCMalloc >>> >> >> >> > free list. [We still have a lot of memory attributed to that >>> >> >> >> > process, >>> >> >> >> > and >>> >> >> >> > the app as a whole shows as using that memory.] >>> >> >> >> > c) We eventually decide to decommit a lot of our free memory. >>> >> >> >> > Currently >>> >> >> >> > this happens when we switch away from a tab. [This saves us >>> >> >> >> > from >>> >> >> >> > further >>> >> >> >> > swapping out the unused memory]. >>> >> >> >> > Now comes the evil problem. >>> >> >> >> > d) We return to the tab which has a giant free list of spans, >>> >> >> >> > most >>> >> >> >> > of >>> >> >> >> > which >>> >> >> >> > are decommitted. [The good news is that the memory is still >>> >> >> >> > decommitted] >>> >> >> >> > e) We allocate a block of memory, such as 32k chunk. This >>> >> >> >> > memory >>> >> >> >> > is >>> >> >> >> > pulled >>> >> >> >> > from a decommitted span, and ONLY the allocated chunk is >>> >> >> >> > committed. >>> >> >> >> > [That >>> >> >> >> > sounds good] >>> >> >> >> > f) We free the block of memory from (e). What ever span is >>> >> >> >> > adjacent >>> >> >> >> > to >>> >> >> >> > that >>> >> >> >> > block is committed <potential oops>. Hence, if we he took (e) >>> >> >> >> > from a >>> >> >> >> > 200Meg >>> >> >> >> > span, the act of freeing (e) will cause a 200Meg commitment!?! >>> >> >> >> > This >>> >> >> >> > in >>> >> >> >> > turn >>> >> >> >> > would not only require touching (and having VirtualAlloc clear >>> >> >> >> > to >>> >> >> >> > zero) >>> >> >> >> > all >>> >> >> >> > allocated memory in the large span, it will also immediately >>> >> >> >> > put >>> >> >> >> > memory >>> >> >> >> > pressure on the OS, and force as much as 200Megs of other apps >>> >> >> >> > to >>> >> >> >> > be >>> >> >> >> > swapped >>> >> >> >> > out to disk :-(. >>> >> >> >> >>> >> >> >> I'm not sure about swapping unless you touch those now committed >>> >> >> >> pages, but only experiment will tell. >>> >> >> >> >>> >> >> >> > I'm wary that our recent fix that allows spans to be >>> >> >> >> > (correctly) >>> >> >> >> > coalesced >>> >> >> >> > independent of their size should cause it to be easier to >>> >> >> >> > coalesce >>> >> >> >> > spans. >>> >> >> >> > Worse yet, as we proceed to further optimize TCMalloc, one >>> >> >> >> > measure >>> >> >> >> > of >>> >> >> >> > success will be that the list of spans will be fragmented less >>> >> >> >> > and >>> >> >> >> > less, >>> >> >> >> > and >>> >> >> >> > we'll have larger and larger coalesced singular spans. Any >>> >> >> >> > large >>> >> >> >> > "reserved" >>> >> >> >> > but not "commited" span will be a jank time-bomb waiting to >>> >> >> >> > blow >>> >> >> >> > up >>> >> >> >> > if >>> >> >> >> > the >>> >> >> >> > process every allocates/frees from such a large span :-(. >>> >> >> >> > >>> >> >> >> > WHAT IS THE PLAN GOING FORWARD (or how can we do better, and >>> >> >> >> > regain >>> >> >> >> > performance, etc.) >>> >> >> >> > We have at least the following plausible alternative ways to >>> >> >> >> > move >>> >> >> >> > forward >>> >> >> >> > with TCMalloc. The overall goal is to avoid wasteful >>> >> >> >> > decommits, >>> >> >> >> > and >>> >> >> >> > at >>> >> >> >> > the >>> >> >> >> > same time avoid heap-wide flailing between minimal and maximal >>> >> >> >> > span >>> >> >> >> > commitment states. >>> >> >> >> > Each free-span is currently the maximal contiguous region of >>> >> >> >> > memory >>> >> >> >> > that >>> >> >> >> > TCMalloc is controlling, but has been deallocated. Currently >>> >> >> >> > spans >>> >> >> >> > have >>> >> >> >> > to >>> >> >> >> > be totally committed, or totally decommitted. There is no >>> >> >> >> > mixture >>> >> >> >> > supported. >>> >> >> >> > a) We could re-architect the span handling to allow spans to >>> >> >> >> > be >>> >> >> >> > combinations >>> >> >> >> > of committed and decommitted regions. >>> >> >> >> > b) We could vary out policy on what to do with a coalesced >>> >> >> >> > span, >>> >> >> >> > based >>> >> >> >> > on >>> >> >> >> > span size and memory pressure. For example: We can >>> >> >> >> > consistently >>> >> >> >> > monitor >>> >> >> >> > the >>> >> >> >> > in-use vs free (but committed) ratio. We can try to stay in >>> >> >> >> > some >>> >> >> >> > "acceptable" region by varying our policy. >>> >> >> >> > c) We could actually return to the OS some portions of spans >>> >> >> >> > that >>> >> >> >> > we >>> >> >> >> > have >>> >> >> >> > decommitted. We could then let the OS give us back these >>> >> >> >> > regions >>> >> >> >> > if >>> >> >> >> > we >>> >> >> >> > need >>> >> >> >> > memory. Until we get them back, we would not be at risk of >>> >> >> >> > doing >>> >> >> >> > unnecessary commits. Decisions about when to return to the OS >>> >> >> >> > can >>> >> >> >> > be >>> >> >> >> > made >>> >> >> >> > based on span size and memory pressure. >>> >> >> >> > d) We can change the interval and forcing function for >>> >> >> >> > decommitting >>> >> >> >> > spans >>> >> >> >> > that are in our free list. >>> >> >> >> > In each of the above cases, we need benchmark data on >>> >> >> >> > user-class >>> >> >> >> > machines to >>> >> >> >> > show costs of these changes. Until we understand the memory >>> >> >> >> > impact, >>> >> >> >> > we >>> >> >> >> > need >>> >> >> >> > to move forward conservatively in our action, and be vigilant >>> >> >> >> > for >>> >> >> >> > thrashing >>> >> >> >> > scenarios. >>> >> >> >> > >>> >> >> >> > Comments?? >>> >> >> >> >>> >> >> >> As a close attempt you may have a look at >>> >> >> >> http://codereview.chromium.org/256013/show >>> >> >> >> >>> >> >> >> That allows spans with a mix of committed/decommitted pages (but >>> >> >> >> only >>> >> >> >> in returned list) as committing seems to live fine if some pages >>> >> >> >> are >>> >> >> >> already committed. >>> >> >> >> >>> >> >> >> That has some minor performance benefit, but I didn't >>> >> >> >> investigate it >>> >> >> >> in details yet. >>> >> >> >> >>> >> >> >> just my 2 cents, >>> >> >> >> anton. >>> >> >> > >>> >> >> > >>> >> > >>> >> > >>> > >>> > >> > > --~--~---------~--~----~------------~-------~--~----~ Chromium Developers mailing list: [email protected] View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~----------~----~----~----~------~----~------~--~---
