On Wed, Sep 30, 2009 at 2:28 PM, James Robinson <[email protected]> wrote:
> On Wed, Sep 30, 2009 at 11:29 AM, Anton Muhin <[email protected]> wrote: > >> On Wed, Sep 30, 2009 at 10:27 PM, Mike Belshe <[email protected]> wrote: >> > >> > >> > On Wed, Sep 30, 2009 at 11:24 AM, Anton Muhin <[email protected]> >> wrote: >> >> >> >> On Wed, Sep 30, 2009 at 10:17 PM, Mike Belshe <[email protected]> >> wrote: >> >> > On Wed, Sep 30, 2009 at 11:05 AM, Anton Muhin <[email protected]> >> >> > wrote: >> >> >> >> >> >> On Wed, Sep 30, 2009 at 9:58 PM, Mike Belshe <[email protected]> >> >> >> wrote: >> >> >> > On Wed, Sep 30, 2009 at 10:48 AM, Anton Muhin <[email protected]> >> >> >> > wrote: >> >> >> >> >> >> >> >> On Wed, Sep 30, 2009 at 9:39 PM, Jim Roskind <[email protected]> >> wrote: >> >> >> >> > If you're not interested in TCMalloc customization for >> Chromium, >> >> >> >> > you >> >> >> >> > should >> >> >> >> > stop reading now. >> >> >> >> > This post is meant to gather some discussion on a topic before >> I >> >> >> >> > code >> >> >> >> > and >> >> >> >> > land a change. >> >> >> >> > MOTIVATION >> >> >> >> > We believe poor memory utilization is at the heart of a lot of >> >> >> >> > jank >> >> >> >> > problems. Such problems may be difficult to repro in short >> >> >> >> > controlled >> >> >> >> > benchmarks, but our users are telling us we have problems, so >> we >> >> >> >> > know >> >> >> >> > we >> >> >> >> > have problems. As a result, we need to be more conservative in >> >> >> >> > memory >> >> >> >> > utilization and handling. >> >> >> >> > SUMMARY OF CHANGE >> >> >> >> > I'm thinking of changing our TCMalloc so that when a span is >> freed >> >> >> >> > into >> >> >> >> > TCMalloc's free list, and it gets coalesced with an adjacent >> span >> >> >> >> > that >> >> >> >> > is >> >> >> >> > already decommitted, that the coalesced span should be entirely >> >> >> >> > decommitted >> >> >> >> > (as opposed to our current customized performance of committing >> >> >> >> > the >> >> >> >> > entire >> >> >> >> > span). >> >> >> >> > This proposed policy was put in place previously by Mike, but >> >> >> >> > (reportedly) >> >> >> >> > caused a 3-5% perf regression in V8. I believe AntonM changed >> >> >> >> > that >> >> >> >> > policy >> >> >> >> > to what we have currently, where always ensure full commitment >> of >> >> >> >> > a >> >> >> >> > coalesced span (regaining V8 performance on a benchmark). >> >> >> >> >> >> >> >> The immediate question and plea. Question: how can we estimate >> >> >> >> performance implications of the change? Yes, we have some >> internal >> >> >> >> benchmarks which could be used for that (they release memory >> >> >> >> heavily). >> >> >> >> Anything else? >> >> >> >> >> >> >> >> Plea: please, do not regress DOM performance unless there are >> really >> >> >> >> compelling reasons. And even in this case :) >> >> >> > >> >> >> > Anton - >> >> >> > All evidence from user complaints and bug reports are that Chrome >> >> >> > uses >> >> >> > too >> >> >> > much memory. If you load Chrome on a 1GB system, you can feel it >> >> >> > yourself. >> >> >> > Unfortunately, we have yet to build a reliable swapping >> benchmark. >> >> >> > By >> >> >> > allowing tcmalloc to accumulate large chunks of unused pages, we >> >> >> > increase >> >> >> > the chance that paging will occur on the system. But because >> paging >> >> >> > is >> >> >> > a >> >> >> > system-wide activity, it can hit our various processes in >> >> >> > unpredictable >> >> >> > ways >> >> >> > - and this leads to jank. I think the jank is worse than the >> >> >> > benchmark >> >> >> > win. >> >> >> > I wish we had a better way to quantify the damage caused by >> paging. >> >> >> > Jim >> >> >> > and >> >> >> > others are working on that. >> >> >> > But it's clear to me that we're just being a memory pig for what >> is >> >> >> > really a >> >> >> > modest gain on a semi-obscure benchmark right now. Using the >> current >> >> >> > algorithms, we have literally multi-hundred megabyte memory usage >> >> >> > swings >> >> >> > in >> >> >> > exchange for 3% on a benchmark. Don't you agree this is the wrong >> >> >> > tradeoff? >> >> >> > (DOM benchmark grows to 500+MB right now; when you switch tabs it >> >> >> > drops >> >> >> > to >> >> >> > <100MB). Other pages have been witnessed which have similar >> behavior >> >> >> > (loading the histograms page). >> >> >> > We may be able to put in some algorithms which are more aware of >> the >> >> >> > current >> >> >> > available memory going forward, but I agree with Jim that there >> will >> >> >> > be >> >> >> > a >> >> >> > lot of negative effects as long as we continue to have such large >> >> >> > memory >> >> >> > swings. >> >> >> >> >> >> Mike, I am completely agree that we should reduce memory usage. On >> >> >> the other hand speed was always one of Chrome trademarks. My >> feeling >> >> >> is more committed pages in free list make us faster (but yes, there >> is >> >> >> paging etc.). That's exactly the reason I asked for some way to >> >> >> quantify quality of different approaches, esp. given classic memory >> >> >> vs. speed dilemma, ideally (imho) both speed and memory usage should >> >> >> be considered. >> >> > >> >> > The team is working on benchmarks. >> >> > I think the evidence of paging is pretty overwhelming. >> >> > Paging and jank is far worse than the small perf boost on dom node >> >> > creation. >> >> > I don't believe the benchmark in question is a significant driver of >> >> > primary performance. Do you? >> >> >> >> To some extent. Just to make it clear: I am not insisting, if >> >> consensus is we should trade performance in DOM for reduced memory >> >> usage in this case, that's fine. I only want to have real numbers >> >> before we make any decision. >> >> >> >> @pkasting: it wasn't 3%, it was (closer to 8% if memory serves). >> > >> > When I checked it in my records show a 217 -> 210 benchmark drop, which >> is >> > 3%. >> >> My numbers were substantially bigger, but anyway we need to remeasure >> it---there are too many factors. >> > > I did some measurements on my windows machine between the current behavior > (always commit spans when merging them together) with a very conservative > alternative (always decommit spans on ::Delete, including the just released > one). The interesting bits are the benchmark scores and memory use at the > end of the run. > > For the DOM benchmark, the score regressed from an average over 4 runs of > 188.25 to 185 which is <2%. The peak memory is about the same but the > memory committed by the tab at the end of the run decreased from an average > of 642MB to 57MB which is a 91% reduction. 4 runs probably isn't enough to > make a definitive statement about the perf impact but I think the memory > impact is pretty clear. The memory characteristics of the V8 benchmark was > unchanged but the performance dropped from an average of 3009 to 2944, which > is about 2%. Sunspider did not change at all in either memory or > performance. > Sorry, disregard those DOM numbers (I wasn't running the right test). I re-ran on dromaeo's DOM Core test suite twice with and without the aggressive decommitting and the numbers are: r23768 unmodified: scores: 299.36 run/s 302.47 run/s memory footprint of renderer at end of run: 333,648KB 334,156KB r23768 with decommitting: scores: 296.06 run/s 293.88 run/s memory footprint of renderer at end of run: 91,856KB 68,208KB I think if the tradeoff is between <2% perf compared to 3-5x memory use it's better to get more conservative with our memory use first and then figure out how to earn back the perf impact without blowing the memory use sky-high again. I think it's pretty clear we don't need all 200MB of extra committed memory in order to do 3 more runs per second. - James > - James > >> >> yours, >> anton. >> >> >> >> >> And forgotten. Regarding the policy to decommit spans in ::Delete. >> >> Please, correct me if I'm wrong, but doesn't that actually would make >> >> all the free spans decommitted---the span would be only committed when >> >> it gets allocated, no? Decommitting only if any of adjacent spans is >> >> decommitted may keep some spans committed, but it's difficult for me >> >> to say how often. >> > >> > Oh - more work is still needed, yes :-) >> > >> > Mike >> > >> >> >> >> yours, >> >> anton. >> >> >> >> > Mike >> >> > >> >> >> >> >> >> yours, >> >> >> anton. >> >> >> >> >> >> > Mike >> >> >> > >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> > WHY CHANGE? >> >> >> >> > The problematic scenario I'm anticipating (and may currently be >> >> >> >> > burning >> >> >> >> > us) >> >> >> >> > is: >> >> >> >> > a) A (renderer) process allocates a lot of memory, and achieves >> a >> >> >> >> > significant high water mark of memory used. >> >> >> >> > b) The process deallocates a lot of memory, and it flows into >> the >> >> >> >> > TCMalloc >> >> >> >> > free list. [We still have a lot of memory attributed to that >> >> >> >> > process, >> >> >> >> > and >> >> >> >> > the app as a whole shows as using that memory.] >> >> >> >> > c) We eventually decide to decommit a lot of our free memory. >> >> >> >> > Currently >> >> >> >> > this happens when we switch away from a tab. [This saves us >> from >> >> >> >> > further >> >> >> >> > swapping out the unused memory]. >> >> >> >> > Now comes the evil problem. >> >> >> >> > d) We return to the tab which has a giant free list of spans, >> most >> >> >> >> > of >> >> >> >> > which >> >> >> >> > are decommitted. [The good news is that the memory is still >> >> >> >> > decommitted] >> >> >> >> > e) We allocate a block of memory, such as 32k chunk. This >> memory >> >> >> >> > is >> >> >> >> > pulled >> >> >> >> > from a decommitted span, and ONLY the allocated chunk is >> >> >> >> > committed. >> >> >> >> > [That >> >> >> >> > sounds good] >> >> >> >> > f) We free the block of memory from (e). What ever span is >> >> >> >> > adjacent >> >> >> >> > to >> >> >> >> > that >> >> >> >> > block is committed <potential oops>. Hence, if we he took (e) >> >> >> >> > from a >> >> >> >> > 200Meg >> >> >> >> > span, the act of freeing (e) will cause a 200Meg commitment!?! >> >> >> >> > This >> >> >> >> > in >> >> >> >> > turn >> >> >> >> > would not only require touching (and having VirtualAlloc clear >> to >> >> >> >> > zero) >> >> >> >> > all >> >> >> >> > allocated memory in the large span, it will also immediately >> put >> >> >> >> > memory >> >> >> >> > pressure on the OS, and force as much as 200Megs of other apps >> to >> >> >> >> > be >> >> >> >> > swapped >> >> >> >> > out to disk :-(. >> >> >> >> >> >> >> >> I'm not sure about swapping unless you touch those now committed >> >> >> >> pages, but only experiment will tell. >> >> >> >> >> >> >> >> > I'm wary that our recent fix that allows spans to be >> (correctly) >> >> >> >> > coalesced >> >> >> >> > independent of their size should cause it to be easier to >> coalesce >> >> >> >> > spans. >> >> >> >> > Worse yet, as we proceed to further optimize TCMalloc, one >> >> >> >> > measure >> >> >> >> > of >> >> >> >> > success will be that the list of spans will be fragmented less >> and >> >> >> >> > less, >> >> >> >> > and >> >> >> >> > we'll have larger and larger coalesced singular spans. Any >> large >> >> >> >> > "reserved" >> >> >> >> > but not "commited" span will be a jank time-bomb waiting to >> blow >> >> >> >> > up >> >> >> >> > if >> >> >> >> > the >> >> >> >> > process every allocates/frees from such a large span :-(. >> >> >> >> > >> >> >> >> > WHAT IS THE PLAN GOING FORWARD (or how can we do better, and >> >> >> >> > regain >> >> >> >> > performance, etc.) >> >> >> >> > We have at least the following plausible alternative ways to >> move >> >> >> >> > forward >> >> >> >> > with TCMalloc. The overall goal is to avoid wasteful >> decommits, >> >> >> >> > and >> >> >> >> > at >> >> >> >> > the >> >> >> >> > same time avoid heap-wide flailing between minimal and maximal >> >> >> >> > span >> >> >> >> > commitment states. >> >> >> >> > Each free-span is currently the maximal contiguous region of >> >> >> >> > memory >> >> >> >> > that >> >> >> >> > TCMalloc is controlling, but has been deallocated. Currently >> >> >> >> > spans >> >> >> >> > have >> >> >> >> > to >> >> >> >> > be totally committed, or totally decommitted. There is no >> mixture >> >> >> >> > supported. >> >> >> >> > a) We could re-architect the span handling to allow spans to be >> >> >> >> > combinations >> >> >> >> > of committed and decommitted regions. >> >> >> >> > b) We could vary out policy on what to do with a coalesced >> span, >> >> >> >> > based >> >> >> >> > on >> >> >> >> > span size and memory pressure. For example: We can >> consistently >> >> >> >> > monitor >> >> >> >> > the >> >> >> >> > in-use vs free (but committed) ratio. We can try to stay in >> some >> >> >> >> > "acceptable" region by varying our policy. >> >> >> >> > c) We could actually return to the OS some portions of spans >> that >> >> >> >> > we >> >> >> >> > have >> >> >> >> > decommitted. We could then let the OS give us back these >> regions >> >> >> >> > if >> >> >> >> > we >> >> >> >> > need >> >> >> >> > memory. Until we get them back, we would not be at risk of >> doing >> >> >> >> > unnecessary commits. Decisions about when to return to the OS >> can >> >> >> >> > be >> >> >> >> > made >> >> >> >> > based on span size and memory pressure. >> >> >> >> > d) We can change the interval and forcing function for >> >> >> >> > decommitting >> >> >> >> > spans >> >> >> >> > that are in our free list. >> >> >> >> > In each of the above cases, we need benchmark data on >> user-class >> >> >> >> > machines to >> >> >> >> > show costs of these changes. Until we understand the memory >> >> >> >> > impact, >> >> >> >> > we >> >> >> >> > need >> >> >> >> > to move forward conservatively in our action, and be vigilant >> for >> >> >> >> > thrashing >> >> >> >> > scenarios. >> >> >> >> > >> >> >> >> > Comments?? >> >> >> >> >> >> >> >> As a close attempt you may have a look at >> >> >> >> http://codereview.chromium.org/256013/show >> >> >> >> >> >> >> >> That allows spans with a mix of committed/decommitted pages (but >> >> >> >> only >> >> >> >> in returned list) as committing seems to live fine if some pages >> are >> >> >> >> already committed. >> >> >> >> >> >> >> >> That has some minor performance benefit, but I didn't investigate >> it >> >> >> >> in details yet. >> >> >> >> >> >> >> >> just my 2 cents, >> >> >> >> anton. >> >> >> > >> >> >> > >> >> > >> >> > >> > >> > >> > > --~--~---------~--~----~------------~-------~--~----~ Chromium Developers mailing list: [email protected] View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~----------~----~----~----~------~----~------~--~---
