[chromium-dev] Re: [Memory] in TCMalloc, more careful handling of VirtualAlloc commit via SystemAlloc

Mike Belshe Wed, 30 Sep 2009 11:27:29 -0700

On Wed, Sep 30, 2009 at 11:24 AM, Anton Muhin <[email protected]> wrote:


> On Wed, Sep 30, 2009 at 10:17 PM, Mike Belshe <[email protected]> wrote:
> > On Wed, Sep 30, 2009 at 11:05 AM, Anton Muhin <[email protected]>
> wrote:
> >>
> >> On Wed, Sep 30, 2009 at 9:58 PM, Mike Belshe <[email protected]>
> wrote:
> >> > On Wed, Sep 30, 2009 at 10:48 AM, Anton Muhin <[email protected]>
> wrote:
> >> >>
> >> >> On Wed, Sep 30, 2009 at 9:39 PM, Jim Roskind <[email protected]> wrote:
> >> >> > If you're not interested in TCMalloc customization for Chromium,
> you
> >> >> > should
> >> >> > stop reading now.
> >> >> > This post is meant to gather some discussion on a topic before I
> code
> >> >> > and
> >> >> > land a change.
> >> >> > MOTIVATION
> >> >> > We believe poor memory utilization is at the heart of a lot of jank
> >> >> > problems.  Such problems may be difficult to repro in short
> >> >> > controlled
> >> >> > benchmarks, but our users are telling us we have problems, so we
> know
> >> >> > we
> >> >> > have problems.  As a result, we need to be more conservative in
> >> >> > memory
> >> >> > utilization and handling.
> >> >> > SUMMARY OF CHANGE
> >> >> > I'm thinking of changing our TCMalloc so that when a span is freed
> >> >> > into
> >> >> > TCMalloc's free list, and it gets coalesced with an adjacent span
> >> >> > that
> >> >> > is
> >> >> > already decommitted, that the coalesced span should be entirely
> >> >> > decommitted
> >> >> > (as opposed to our current customized performance of committing the
> >> >> > entire
> >> >> > span).
> >> >> > This proposed policy was put in place previously by Mike, but
> >> >> > (reportedly)
> >> >> > caused a 3-5% perf regression in V8.  I believe AntonM changed that
> >> >> > policy
> >> >> > to what we have currently, where always ensure full commitment of a
> >> >> > coalesced span (regaining V8 performance on a benchmark).
> >> >>
> >> >> The immediate question and plea.  Question: how can we estimate
> >> >> performance implications of the change?  Yes, we have some internal
> >> >> benchmarks which could be used for that (they release memory
> heavily).
> >> >>  Anything else?
> >> >>
> >> >> Plea: please, do not regress DOM performance unless there are really
> >> >> compelling reasons.  And even in this case :)
> >> >
> >> > Anton -
> >> > All evidence from user complaints and bug reports are that Chrome uses
> >> > too
> >> > much memory.  If you load Chrome on a 1GB system, you can feel it
> >> > yourself.
> >> >  Unfortunately, we have yet to build a reliable swapping benchmark.
>  By
> >> > allowing tcmalloc to accumulate large chunks of unused pages, we
> >> > increase
> >> > the chance that paging will occur on the system.  But because paging
> is
> >> > a
> >> > system-wide activity, it can hit our various processes in
> unpredictable
> >> > ways
> >> > - and this leads to jank.  I think the jank is worse than the
> benchmark
> >> > win.
> >> > I wish we had a better way to quantify the damage caused by paging.
>  Jim
> >> > and
> >> > others are working on that.
> >> > But it's clear to me that we're just being a memory pig for what is
> >> > really a
> >> > modest gain on a semi-obscure benchmark right now.  Using the current
> >> > algorithms, we have literally multi-hundred megabyte memory usage
> swings
> >> > in
> >> > exchange for 3% on a benchmark.  Don't you agree this is the wrong
> >> > tradeoff?
> >> >  (DOM benchmark grows to 500+MB right now; when you switch tabs it
> drops
> >> > to
> >> > <100MB).  Other pages have been witnessed which have similar behavior
> >> > (loading the histograms page).
> >> > We may be able to put in some algorithms which are more aware of the
> >> > current
> >> > available memory going forward, but I agree with Jim that there will
> be
> >> > a
> >> > lot of negative effects as long as we continue to have such large
> memory
> >> > swings.
> >>
> >> Mike, I am completely agree that we should reduce memory usage.  On
> >> the other hand speed was always one of Chrome trademarks.  My feeling
> >> is more committed pages in free list make us faster (but yes, there is
> >> paging etc.).  That's exactly the reason I asked for some way to
> >> quantify quality of different approaches, esp. given classic memory
> >> vs. speed dilemma, ideally (imho) both speed and memory usage should
> >> be considered.
> >
> > The team is working on benchmarks.
> > I think the evidence of paging is pretty overwhelming.
> > Paging and jank is far worse than the small perf boost on dom node
> creation.
> >  I don't believe the benchmark in question is a significant driver of
> > primary performance.  Do you?
>
> To some extent.  Just to make it clear: I am not insisting, if
> consensus is we should trade performance in DOM for reduced memory
> usage in this case, that's fine.  I only want to have real numbers
> before we make any decision.
>
> @pkasting: it wasn't 3%, it was (closer to 8% if memory serves).
>

When I checked it in my records show a 217 -> 210 benchmark drop, which is
3%.


> And forgotten.  Regarding the policy to decommit spans in ::Delete.
> Please, correct me if I'm wrong, but doesn't that actually would make
> all the free spans decommitted---the span would be only committed when
> it gets allocated, no?  Decommitting only if any of adjacent spans is
> decommitted may keep some spans committed, but it's difficult for me
> to say how often.
>

Oh - more work is still needed, yes :-)

Mike


>
> yours,
> anton.
>
> > Mike
> >
> >>
> >> yours,
> >> anton.
> >>
> >> > Mike
> >> >
> >> >
> >> >
> >> >>
> >> >> > WHY CHANGE?
> >> >> > The problematic scenario I'm anticipating (and may currently be
> >> >> > burning
> >> >> > us)
> >> >> > is:
> >> >> > a) A (renderer) process allocates a lot of memory, and achieves a
> >> >> > significant high water mark of memory used.
> >> >> > b) The process deallocates a lot of memory, and it flows into the
> >> >> > TCMalloc
> >> >> > free list. [We still have a lot of memory attributed to that
> process,
> >> >> > and
> >> >> > the app as a whole shows as using that memory.]
> >> >> > c) We eventually decide to decommit a lot of our free memory.
> >> >> >  Currently
> >> >> > this happens when we switch away from a tab. [This saves us from
> >> >> > further
> >> >> > swapping out the unused memory].
> >> >> > Now comes the evil problem.
> >> >> > d) We return to the tab which has a giant free list of spans, most
> of
> >> >> > which
> >> >> > are decommitted.  [The good news is that the memory is still
> >> >> >  decommitted]
> >> >> > e) We allocate  a block of memory, such as 32k chunk.  This memory
> is
> >> >> > pulled
> >> >> > from a decommitted span, and ONLY the allocated chunk is committed.
> >> >> > [That
> >> >> > sounds good]
> >> >> > f) We free the block of memory from (e).  What ever span is
> adjacent
> >> >> > to
> >> >> > that
> >> >> > block is committed <potential oops>.  Hence, if we he took (e) from
> a
> >> >> > 200Meg
> >> >> > span, the act of freeing (e) will cause a 200Meg commitment!?!
>  This
> >> >> > in
> >> >> > turn
> >> >> > would not only require touching (and having VirtualAlloc clear to
> >> >> > zero)
> >> >> > all
> >> >> > allocated memory in the large span, it will also immediately put
> >> >> > memory
> >> >> > pressure on the OS, and force as much as 200Megs of other apps to
> be
> >> >> > swapped
> >> >> > out to disk :-(.
> >> >>
> >> >> I'm not sure about swapping unless you touch those now committed
> >> >> pages, but only experiment will tell.
> >> >>
> >> >> > I'm wary that our recent fix that allows spans to be (correctly)
> >> >> > coalesced
> >> >> > independent of their size should cause it to be easier to coalesce
> >> >> > spans.
> >> >> >  Worse yet, as we proceed to further optimize TCMalloc, one measure
> >> >> > of
> >> >> > success will be that the list of spans will be fragmented less and
> >> >> > less,
> >> >> > and
> >> >> > we'll have larger and larger coalesced singular spans.  Any large
> >> >> > "reserved"
> >> >> > but not "commited" span will be a jank time-bomb waiting to blow up
> >> >> > if
> >> >> > the
> >> >> > process every allocates/frees from such a large span :-(.
> >> >> >
> >> >> > WHAT IS THE PLAN GOING FORWARD (or how can we do better, and regain
> >> >> > performance, etc.)
> >> >> > We have at least the following plausible alternative ways to move
> >> >> > forward
> >> >> > with TCMalloc.  The overall goal is to avoid wasteful decommits,
> and
> >> >> > at
> >> >> > the
> >> >> > same time avoid heap-wide flailing between minimal and maximal span
> >> >> > commitment states.
> >> >> > Each free-span is currently the maximal contiguous region of memory
> >> >> > that
> >> >> > TCMalloc is controlling, but has been deallocated.  Currently spans
> >> >> > have
> >> >> > to
> >> >> > be totally committed, or totally decommitted.  There is no mixture
> >> >> > supported.
> >> >> > a) We could re-architect the span handling to allow spans to be
> >> >> > combinations
> >> >> > of committed and decommitted regions.
> >> >> > b) We could vary out policy on what to do with a coalesced span,
> >> >> > based
> >> >> > on
> >> >> > span size and memory pressure.  For example: We can consistently
> >> >> > monitor
> >> >> > the
> >> >> > in-use vs free (but committed) ratio.  We can try to stay in some
> >> >> > "acceptable" region by varying our policy.
> >> >> > c) We could actually return to the OS some portions of spans that
> we
> >> >> > have
> >> >> > decommitted.  We could then let the OS give us back these regions
> if
> >> >> > we
> >> >> > need
> >> >> > memory.  Until we get them back, we would not be at risk of doing
> >> >> > unnecessary commits.  Decisions about when to return to the OS can
> be
> >> >> > made
> >> >> > based on span size and memory pressure.
> >> >> > d) We can change the interval and forcing function for decommitting
> >> >> > spans
> >> >> > that are in our free list.
> >> >> > In each of the above cases, we need benchmark data on user-class
> >> >> > machines to
> >> >> > show costs of these changes.  Until we understand the memory
> impact,
> >> >> > we
> >> >> > need
> >> >> > to move forward conservatively in our action, and be vigilant for
> >> >> > thrashing
> >> >> > scenarios.
> >> >> >
> >> >> > Comments??
> >> >>
> >> >> As a close attempt you may have a look at
> >> >> http://codereview.chromium.org/256013/show
> >> >>
> >> >> That allows spans with a mix of committed/decommitted pages (but only
> >> >> in returned list) as committing seems to live fine if some pages are
> >> >> already committed.
> >> >>
> >> >> That has some minor performance benefit, but I didn't investigate it
> >> >> in details yet.
> >> >>
> >> >> just my 2 cents,
> >> >> anton.
> >> >
> >> >
> >
> >
>

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: [email protected] 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

[chromium-dev] Re: [Memory] in TCMalloc, more careful handling of VirtualAlloc commit via SystemAlloc

Reply via email to