On Wed, Sep 30, 2009 at 10:27 PM, Mike Belshe <[email protected]> wrote: > > > On Wed, Sep 30, 2009 at 11:24 AM, Anton Muhin <[email protected]> wrote: >> >> On Wed, Sep 30, 2009 at 10:17 PM, Mike Belshe <[email protected]> wrote: >> > On Wed, Sep 30, 2009 at 11:05 AM, Anton Muhin <[email protected]> >> > wrote: >> >> >> >> On Wed, Sep 30, 2009 at 9:58 PM, Mike Belshe <[email protected]> >> >> wrote: >> >> > On Wed, Sep 30, 2009 at 10:48 AM, Anton Muhin <[email protected]> >> >> > wrote: >> >> >> >> >> >> On Wed, Sep 30, 2009 at 9:39 PM, Jim Roskind <[email protected]> wrote: >> >> >> > If you're not interested in TCMalloc customization for Chromium, >> >> >> > you >> >> >> > should >> >> >> > stop reading now. >> >> >> > This post is meant to gather some discussion on a topic before I >> >> >> > code >> >> >> > and >> >> >> > land a change. >> >> >> > MOTIVATION >> >> >> > We believe poor memory utilization is at the heart of a lot of >> >> >> > jank >> >> >> > problems. Such problems may be difficult to repro in short >> >> >> > controlled >> >> >> > benchmarks, but our users are telling us we have problems, so we >> >> >> > know >> >> >> > we >> >> >> > have problems. As a result, we need to be more conservative in >> >> >> > memory >> >> >> > utilization and handling. >> >> >> > SUMMARY OF CHANGE >> >> >> > I'm thinking of changing our TCMalloc so that when a span is freed >> >> >> > into >> >> >> > TCMalloc's free list, and it gets coalesced with an adjacent span >> >> >> > that >> >> >> > is >> >> >> > already decommitted, that the coalesced span should be entirely >> >> >> > decommitted >> >> >> > (as opposed to our current customized performance of committing >> >> >> > the >> >> >> > entire >> >> >> > span). >> >> >> > This proposed policy was put in place previously by Mike, but >> >> >> > (reportedly) >> >> >> > caused a 3-5% perf regression in V8. I believe AntonM changed >> >> >> > that >> >> >> > policy >> >> >> > to what we have currently, where always ensure full commitment of >> >> >> > a >> >> >> > coalesced span (regaining V8 performance on a benchmark). >> >> >> >> >> >> The immediate question and plea. Question: how can we estimate >> >> >> performance implications of the change? Yes, we have some internal >> >> >> benchmarks which could be used for that (they release memory >> >> >> heavily). >> >> >> Anything else? >> >> >> >> >> >> Plea: please, do not regress DOM performance unless there are really >> >> >> compelling reasons. And even in this case :) >> >> > >> >> > Anton - >> >> > All evidence from user complaints and bug reports are that Chrome >> >> > uses >> >> > too >> >> > much memory. If you load Chrome on a 1GB system, you can feel it >> >> > yourself. >> >> > Unfortunately, we have yet to build a reliable swapping benchmark. >> >> > By >> >> > allowing tcmalloc to accumulate large chunks of unused pages, we >> >> > increase >> >> > the chance that paging will occur on the system. But because paging >> >> > is >> >> > a >> >> > system-wide activity, it can hit our various processes in >> >> > unpredictable >> >> > ways >> >> > - and this leads to jank. I think the jank is worse than the >> >> > benchmark >> >> > win. >> >> > I wish we had a better way to quantify the damage caused by paging. >> >> > Jim >> >> > and >> >> > others are working on that. >> >> > But it's clear to me that we're just being a memory pig for what is >> >> > really a >> >> > modest gain on a semi-obscure benchmark right now. Using the current >> >> > algorithms, we have literally multi-hundred megabyte memory usage >> >> > swings >> >> > in >> >> > exchange for 3% on a benchmark. Don't you agree this is the wrong >> >> > tradeoff? >> >> > (DOM benchmark grows to 500+MB right now; when you switch tabs it >> >> > drops >> >> > to >> >> > <100MB). Other pages have been witnessed which have similar behavior >> >> > (loading the histograms page). >> >> > We may be able to put in some algorithms which are more aware of the >> >> > current >> >> > available memory going forward, but I agree with Jim that there will >> >> > be >> >> > a >> >> > lot of negative effects as long as we continue to have such large >> >> > memory >> >> > swings. >> >> >> >> Mike, I am completely agree that we should reduce memory usage. On >> >> the other hand speed was always one of Chrome trademarks. My feeling >> >> is more committed pages in free list make us faster (but yes, there is >> >> paging etc.). That's exactly the reason I asked for some way to >> >> quantify quality of different approaches, esp. given classic memory >> >> vs. speed dilemma, ideally (imho) both speed and memory usage should >> >> be considered. >> > >> > The team is working on benchmarks. >> > I think the evidence of paging is pretty overwhelming. >> > Paging and jank is far worse than the small perf boost on dom node >> > creation. >> > I don't believe the benchmark in question is a significant driver of >> > primary performance. Do you? >> >> To some extent. Just to make it clear: I am not insisting, if >> consensus is we should trade performance in DOM for reduced memory >> usage in this case, that's fine. I only want to have real numbers >> before we make any decision. >> >> @pkasting: it wasn't 3%, it was (closer to 8% if memory serves). > > When I checked it in my records show a 217 -> 210 benchmark drop, which is > 3%.
My numbers were substantially bigger, but anyway we need to remeasure it---there are too many factors. yours, anton. >> >> And forgotten. Regarding the policy to decommit spans in ::Delete. >> Please, correct me if I'm wrong, but doesn't that actually would make >> all the free spans decommitted---the span would be only committed when >> it gets allocated, no? Decommitting only if any of adjacent spans is >> decommitted may keep some spans committed, but it's difficult for me >> to say how often. > > Oh - more work is still needed, yes :-) > > Mike > >> >> yours, >> anton. >> >> > Mike >> > >> >> >> >> yours, >> >> anton. >> >> >> >> > Mike >> >> > >> >> > >> >> > >> >> >> >> >> >> > WHY CHANGE? >> >> >> > The problematic scenario I'm anticipating (and may currently be >> >> >> > burning >> >> >> > us) >> >> >> > is: >> >> >> > a) A (renderer) process allocates a lot of memory, and achieves a >> >> >> > significant high water mark of memory used. >> >> >> > b) The process deallocates a lot of memory, and it flows into the >> >> >> > TCMalloc >> >> >> > free list. [We still have a lot of memory attributed to that >> >> >> > process, >> >> >> > and >> >> >> > the app as a whole shows as using that memory.] >> >> >> > c) We eventually decide to decommit a lot of our free memory. >> >> >> > Currently >> >> >> > this happens when we switch away from a tab. [This saves us from >> >> >> > further >> >> >> > swapping out the unused memory]. >> >> >> > Now comes the evil problem. >> >> >> > d) We return to the tab which has a giant free list of spans, most >> >> >> > of >> >> >> > which >> >> >> > are decommitted. [The good news is that the memory is still >> >> >> > decommitted] >> >> >> > e) We allocate a block of memory, such as 32k chunk. This memory >> >> >> > is >> >> >> > pulled >> >> >> > from a decommitted span, and ONLY the allocated chunk is >> >> >> > committed. >> >> >> > [That >> >> >> > sounds good] >> >> >> > f) We free the block of memory from (e). What ever span is >> >> >> > adjacent >> >> >> > to >> >> >> > that >> >> >> > block is committed <potential oops>. Hence, if we he took (e) >> >> >> > from a >> >> >> > 200Meg >> >> >> > span, the act of freeing (e) will cause a 200Meg commitment!?! >> >> >> > This >> >> >> > in >> >> >> > turn >> >> >> > would not only require touching (and having VirtualAlloc clear to >> >> >> > zero) >> >> >> > all >> >> >> > allocated memory in the large span, it will also immediately put >> >> >> > memory >> >> >> > pressure on the OS, and force as much as 200Megs of other apps to >> >> >> > be >> >> >> > swapped >> >> >> > out to disk :-(. >> >> >> >> >> >> I'm not sure about swapping unless you touch those now committed >> >> >> pages, but only experiment will tell. >> >> >> >> >> >> > I'm wary that our recent fix that allows spans to be (correctly) >> >> >> > coalesced >> >> >> > independent of their size should cause it to be easier to coalesce >> >> >> > spans. >> >> >> > Worse yet, as we proceed to further optimize TCMalloc, one >> >> >> > measure >> >> >> > of >> >> >> > success will be that the list of spans will be fragmented less and >> >> >> > less, >> >> >> > and >> >> >> > we'll have larger and larger coalesced singular spans. Any large >> >> >> > "reserved" >> >> >> > but not "commited" span will be a jank time-bomb waiting to blow >> >> >> > up >> >> >> > if >> >> >> > the >> >> >> > process every allocates/frees from such a large span :-(. >> >> >> > >> >> >> > WHAT IS THE PLAN GOING FORWARD (or how can we do better, and >> >> >> > regain >> >> >> > performance, etc.) >> >> >> > We have at least the following plausible alternative ways to move >> >> >> > forward >> >> >> > with TCMalloc. The overall goal is to avoid wasteful decommits, >> >> >> > and >> >> >> > at >> >> >> > the >> >> >> > same time avoid heap-wide flailing between minimal and maximal >> >> >> > span >> >> >> > commitment states. >> >> >> > Each free-span is currently the maximal contiguous region of >> >> >> > memory >> >> >> > that >> >> >> > TCMalloc is controlling, but has been deallocated. Currently >> >> >> > spans >> >> >> > have >> >> >> > to >> >> >> > be totally committed, or totally decommitted. There is no mixture >> >> >> > supported. >> >> >> > a) We could re-architect the span handling to allow spans to be >> >> >> > combinations >> >> >> > of committed and decommitted regions. >> >> >> > b) We could vary out policy on what to do with a coalesced span, >> >> >> > based >> >> >> > on >> >> >> > span size and memory pressure. For example: We can consistently >> >> >> > monitor >> >> >> > the >> >> >> > in-use vs free (but committed) ratio. We can try to stay in some >> >> >> > "acceptable" region by varying our policy. >> >> >> > c) We could actually return to the OS some portions of spans that >> >> >> > we >> >> >> > have >> >> >> > decommitted. We could then let the OS give us back these regions >> >> >> > if >> >> >> > we >> >> >> > need >> >> >> > memory. Until we get them back, we would not be at risk of doing >> >> >> > unnecessary commits. Decisions about when to return to the OS can >> >> >> > be >> >> >> > made >> >> >> > based on span size and memory pressure. >> >> >> > d) We can change the interval and forcing function for >> >> >> > decommitting >> >> >> > spans >> >> >> > that are in our free list. >> >> >> > In each of the above cases, we need benchmark data on user-class >> >> >> > machines to >> >> >> > show costs of these changes. Until we understand the memory >> >> >> > impact, >> >> >> > we >> >> >> > need >> >> >> > to move forward conservatively in our action, and be vigilant for >> >> >> > thrashing >> >> >> > scenarios. >> >> >> > >> >> >> > Comments?? >> >> >> >> >> >> As a close attempt you may have a look at >> >> >> http://codereview.chromium.org/256013/show >> >> >> >> >> >> That allows spans with a mix of committed/decommitted pages (but >> >> >> only >> >> >> in returned list) as committing seems to live fine if some pages are >> >> >> already committed. >> >> >> >> >> >> That has some minor performance benefit, but I didn't investigate it >> >> >> in details yet. >> >> >> >> >> >> just my 2 cents, >> >> >> anton. >> >> > >> >> > >> > >> > > > --~--~---------~--~----~------------~-------~--~----~ Chromium Developers mailing list: [email protected] View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~----------~----~----~----~------~----~------~--~---
