[chromium-dev] Re: [Memory] in TCMalloc, more careful handling of VirtualAlloc commit via SystemAlloc

James Robinson Wed, 30 Sep 2009 16:56:55 -0700

On Wed, Sep 30, 2009 at 2:28 PM, James Robinson <[email protected]> wrote:


> On Wed, Sep 30, 2009 at 11:29 AM, Anton Muhin <[email protected]> wrote:
>
>> On Wed, Sep 30, 2009 at 10:27 PM, Mike Belshe <[email protected]> wrote:
>> >
>> >
>> > On Wed, Sep 30, 2009 at 11:24 AM, Anton Muhin <[email protected]>
>> wrote:
>> >>
>> >> On Wed, Sep 30, 2009 at 10:17 PM, Mike Belshe <[email protected]>
>> wrote:
>> >> > On Wed, Sep 30, 2009 at 11:05 AM, Anton Muhin <[email protected]>
>> >> > wrote:
>> >> >>
>> >> >> On Wed, Sep 30, 2009 at 9:58 PM, Mike Belshe <[email protected]>
>> >> >> wrote:
>> >> >> > On Wed, Sep 30, 2009 at 10:48 AM, Anton Muhin <[email protected]>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> On Wed, Sep 30, 2009 at 9:39 PM, Jim Roskind <[email protected]>
>> wrote:
>> >> >> >> > If you're not interested in TCMalloc customization for
>> Chromium,
>> >> >> >> > you
>> >> >> >> > should
>> >> >> >> > stop reading now.
>> >> >> >> > This post is meant to gather some discussion on a topic before
>> I
>> >> >> >> > code
>> >> >> >> > and
>> >> >> >> > land a change.
>> >> >> >> > MOTIVATION
>> >> >> >> > We believe poor memory utilization is at the heart of a lot of
>> >> >> >> > jank
>> >> >> >> > problems.  Such problems may be difficult to repro in short
>> >> >> >> > controlled
>> >> >> >> > benchmarks, but our users are telling us we have problems, so
>> we
>> >> >> >> > know
>> >> >> >> > we
>> >> >> >> > have problems.  As a result, we need to be more conservative in
>> >> >> >> > memory
>> >> >> >> > utilization and handling.
>> >> >> >> > SUMMARY OF CHANGE
>> >> >> >> > I'm thinking of changing our TCMalloc so that when a span is
>> freed
>> >> >> >> > into
>> >> >> >> > TCMalloc's free list, and it gets coalesced with an adjacent
>> span
>> >> >> >> > that
>> >> >> >> > is
>> >> >> >> > already decommitted, that the coalesced span should be entirely
>> >> >> >> > decommitted
>> >> >> >> > (as opposed to our current customized performance of committing
>> >> >> >> > the
>> >> >> >> > entire
>> >> >> >> > span).
>> >> >> >> > This proposed policy was put in place previously by Mike, but
>> >> >> >> > (reportedly)
>> >> >> >> > caused a 3-5% perf regression in V8.  I believe AntonM changed
>> >> >> >> > that
>> >> >> >> > policy
>> >> >> >> > to what we have currently, where always ensure full commitment
>> of
>> >> >> >> > a
>> >> >> >> > coalesced span (regaining V8 performance on a benchmark).
>> >> >> >>
>> >> >> >> The immediate question and plea.  Question: how can we estimate
>> >> >> >> performance implications of the change?  Yes, we have some
>> internal
>> >> >> >> benchmarks which could be used for that (they release memory
>> >> >> >> heavily).
>> >> >> >>  Anything else?
>> >> >> >>
>> >> >> >> Plea: please, do not regress DOM performance unless there are
>> really
>> >> >> >> compelling reasons.  And even in this case :)
>> >> >> >
>> >> >> > Anton -
>> >> >> > All evidence from user complaints and bug reports are that Chrome
>> >> >> > uses
>> >> >> > too
>> >> >> > much memory.  If you load Chrome on a 1GB system, you can feel it
>> >> >> > yourself.
>> >> >> >  Unfortunately, we have yet to build a reliable swapping
>> benchmark.
>> >> >> >  By
>> >> >> > allowing tcmalloc to accumulate large chunks of unused pages, we
>> >> >> > increase
>> >> >> > the chance that paging will occur on the system.  But because
>> paging
>> >> >> > is
>> >> >> > a
>> >> >> > system-wide activity, it can hit our various processes in
>> >> >> > unpredictable
>> >> >> > ways
>> >> >> > - and this leads to jank.  I think the jank is worse than the
>> >> >> > benchmark
>> >> >> > win.
>> >> >> > I wish we had a better way to quantify the damage caused by
>> paging.
>> >> >> >  Jim
>> >> >> > and
>> >> >> > others are working on that.
>> >> >> > But it's clear to me that we're just being a memory pig for what
>> is
>> >> >> > really a
>> >> >> > modest gain on a semi-obscure benchmark right now.  Using the
>> current
>> >> >> > algorithms, we have literally multi-hundred megabyte memory usage
>> >> >> > swings
>> >> >> > in
>> >> >> > exchange for 3% on a benchmark.  Don't you agree this is the wrong
>> >> >> > tradeoff?
>> >> >> >  (DOM benchmark grows to 500+MB right now; when you switch tabs it
>> >> >> > drops
>> >> >> > to
>> >> >> > <100MB).  Other pages have been witnessed which have similar
>> behavior
>> >> >> > (loading the histograms page).
>> >> >> > We may be able to put in some algorithms which are more aware of
>> the
>> >> >> > current
>> >> >> > available memory going forward, but I agree with Jim that there
>> will
>> >> >> > be
>> >> >> > a
>> >> >> > lot of negative effects as long as we continue to have such large
>> >> >> > memory
>> >> >> > swings.
>> >> >>
>> >> >> Mike, I am completely agree that we should reduce memory usage.  On
>> >> >> the other hand speed was always one of Chrome trademarks.  My
>> feeling
>> >> >> is more committed pages in free list make us faster (but yes, there
>> is
>> >> >> paging etc.).  That's exactly the reason I asked for some way to
>> >> >> quantify quality of different approaches, esp. given classic memory
>> >> >> vs. speed dilemma, ideally (imho) both speed and memory usage should
>> >> >> be considered.
>> >> >
>> >> > The team is working on benchmarks.
>> >> > I think the evidence of paging is pretty overwhelming.
>> >> > Paging and jank is far worse than the small perf boost on dom node
>> >> > creation.
>> >> >  I don't believe the benchmark in question is a significant driver of
>> >> > primary performance.  Do you?
>> >>
>> >> To some extent.  Just to make it clear: I am not insisting, if
>> >> consensus is we should trade performance in DOM for reduced memory
>> >> usage in this case, that's fine.  I only want to have real numbers
>> >> before we make any decision.
>> >>
>> >> @pkasting: it wasn't 3%, it was (closer to 8% if memory serves).
>> >
>> > When I checked it in my records show a 217 -> 210 benchmark drop, which
>> is
>> > 3%.
>>
>> My numbers were substantially bigger, but anyway we need to remeasure
>> it---there are too many factors.
>>
>
> I did some measurements on my windows machine between the current behavior
> (always commit spans when merging them together) with a very conservative
> alternative (always decommit spans on ::Delete, including the just released
> one).  The interesting bits are the benchmark scores and memory use at the
> end of the run.
>
> For the DOM benchmark, the score regressed from an average over 4 runs of
> 188.25 to 185 which is <2%.  The peak memory is about the same but the
> memory committed by the tab at the end of the run decreased from an average
> of 642MB to 57MB which is a 91% reduction.  4 runs probably isn't enough to
> make a definitive statement about the perf impact but I think the memory
> impact is pretty clear.  The memory characteristics of the V8 benchmark was
> unchanged but the performance dropped from an average of 3009 to 2944, which
> is about 2%.  Sunspider did not change at all in either memory or
> performance.
>

Sorry, disregard those DOM numbers (I wasn't running the right test).

I re-ran on dromaeo's DOM Core test suite twice with and without the
aggressive decommitting and the numbers are:

r23768 unmodified:
scores: 299.36 run/s  302.47 run/s
memory footprint of renderer at end of run: 333,648KB 334,156KB

r23768 with decommitting:
scores: 296.06 run/s  293.88 run/s
memory footprint of renderer at end of run: 91,856KB 68,208KB

I think if the tradeoff is between <2% perf compared to 3-5x memory use it's
better to get more conservative with our memory use first and then figure
out how to earn back the perf impact without blowing the memory use sky-high
again.  I think it's pretty clear we don't need all 200MB of extra committed
memory in order to do 3 more runs per second.

- James


> - James
>
>>
>> yours,
>> anton.
>>
>> >>
>> >> And forgotten.  Regarding the policy to decommit spans in ::Delete.
>> >> Please, correct me if I'm wrong, but doesn't that actually would make
>> >> all the free spans decommitted---the span would be only committed when
>> >> it gets allocated, no?  Decommitting only if any of adjacent spans is
>> >> decommitted may keep some spans committed, but it's difficult for me
>> >> to say how often.
>> >
>> > Oh - more work is still needed, yes :-)
>> >
>> > Mike
>> >
>> >>
>> >> yours,
>> >> anton.
>> >>
>> >> > Mike
>> >> >
>> >> >>
>> >> >> yours,
>> >> >> anton.
>> >> >>
>> >> >> > Mike
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >>
>> >> >> >> > WHY CHANGE?
>> >> >> >> > The problematic scenario I'm anticipating (and may currently be
>> >> >> >> > burning
>> >> >> >> > us)
>> >> >> >> > is:
>> >> >> >> > a) A (renderer) process allocates a lot of memory, and achieves
>> a
>> >> >> >> > significant high water mark of memory used.
>> >> >> >> > b) The process deallocates a lot of memory, and it flows into
>> the
>> >> >> >> > TCMalloc
>> >> >> >> > free list. [We still have a lot of memory attributed to that
>> >> >> >> > process,
>> >> >> >> > and
>> >> >> >> > the app as a whole shows as using that memory.]
>> >> >> >> > c) We eventually decide to decommit a lot of our free memory.
>> >> >> >> >  Currently
>> >> >> >> > this happens when we switch away from a tab. [This saves us
>> from
>> >> >> >> > further
>> >> >> >> > swapping out the unused memory].
>> >> >> >> > Now comes the evil problem.
>> >> >> >> > d) We return to the tab which has a giant free list of spans,
>> most
>> >> >> >> > of
>> >> >> >> > which
>> >> >> >> > are decommitted.  [The good news is that the memory is still
>> >> >> >> >  decommitted]
>> >> >> >> > e) We allocate  a block of memory, such as 32k chunk.  This
>> memory
>> >> >> >> > is
>> >> >> >> > pulled
>> >> >> >> > from a decommitted span, and ONLY the allocated chunk is
>> >> >> >> > committed.
>> >> >> >> > [That
>> >> >> >> > sounds good]
>> >> >> >> > f) We free the block of memory from (e).  What ever span is
>> >> >> >> > adjacent
>> >> >> >> > to
>> >> >> >> > that
>> >> >> >> > block is committed <potential oops>.  Hence, if we he took (e)
>> >> >> >> > from a
>> >> >> >> > 200Meg
>> >> >> >> > span, the act of freeing (e) will cause a 200Meg commitment!?!
>> >> >> >> >  This
>> >> >> >> > in
>> >> >> >> > turn
>> >> >> >> > would not only require touching (and having VirtualAlloc clear
>> to
>> >> >> >> > zero)
>> >> >> >> > all
>> >> >> >> > allocated memory in the large span, it will also immediately
>> put
>> >> >> >> > memory
>> >> >> >> > pressure on the OS, and force as much as 200Megs of other apps
>> to
>> >> >> >> > be
>> >> >> >> > swapped
>> >> >> >> > out to disk :-(.
>> >> >> >>
>> >> >> >> I'm not sure about swapping unless you touch those now committed
>> >> >> >> pages, but only experiment will tell.
>> >> >> >>
>> >> >> >> > I'm wary that our recent fix that allows spans to be
>> (correctly)
>> >> >> >> > coalesced
>> >> >> >> > independent of their size should cause it to be easier to
>> coalesce
>> >> >> >> > spans.
>> >> >> >> >  Worse yet, as we proceed to further optimize TCMalloc, one
>> >> >> >> > measure
>> >> >> >> > of
>> >> >> >> > success will be that the list of spans will be fragmented less
>> and
>> >> >> >> > less,
>> >> >> >> > and
>> >> >> >> > we'll have larger and larger coalesced singular spans.  Any
>> large
>> >> >> >> > "reserved"
>> >> >> >> > but not "commited" span will be a jank time-bomb waiting to
>> blow
>> >> >> >> > up
>> >> >> >> > if
>> >> >> >> > the
>> >> >> >> > process every allocates/frees from such a large span :-(.
>> >> >> >> >
>> >> >> >> > WHAT IS THE PLAN GOING FORWARD (or how can we do better, and
>> >> >> >> > regain
>> >> >> >> > performance, etc.)
>> >> >> >> > We have at least the following plausible alternative ways to
>> move
>> >> >> >> > forward
>> >> >> >> > with TCMalloc.  The overall goal is to avoid wasteful
>> decommits,
>> >> >> >> > and
>> >> >> >> > at
>> >> >> >> > the
>> >> >> >> > same time avoid heap-wide flailing between minimal and maximal
>> >> >> >> > span
>> >> >> >> > commitment states.
>> >> >> >> > Each free-span is currently the maximal contiguous region of
>> >> >> >> > memory
>> >> >> >> > that
>> >> >> >> > TCMalloc is controlling, but has been deallocated.  Currently
>> >> >> >> > spans
>> >> >> >> > have
>> >> >> >> > to
>> >> >> >> > be totally committed, or totally decommitted.  There is no
>> mixture
>> >> >> >> > supported.
>> >> >> >> > a) We could re-architect the span handling to allow spans to be
>> >> >> >> > combinations
>> >> >> >> > of committed and decommitted regions.
>> >> >> >> > b) We could vary out policy on what to do with a coalesced
>> span,
>> >> >> >> > based
>> >> >> >> > on
>> >> >> >> > span size and memory pressure.  For example: We can
>> consistently
>> >> >> >> > monitor
>> >> >> >> > the
>> >> >> >> > in-use vs free (but committed) ratio.  We can try to stay in
>> some
>> >> >> >> > "acceptable" region by varying our policy.
>> >> >> >> > c) We could actually return to the OS some portions of spans
>> that
>> >> >> >> > we
>> >> >> >> > have
>> >> >> >> > decommitted.  We could then let the OS give us back these
>> regions
>> >> >> >> > if
>> >> >> >> > we
>> >> >> >> > need
>> >> >> >> > memory.  Until we get them back, we would not be at risk of
>> doing
>> >> >> >> > unnecessary commits.  Decisions about when to return to the OS
>> can
>> >> >> >> > be
>> >> >> >> > made
>> >> >> >> > based on span size and memory pressure.
>> >> >> >> > d) We can change the interval and forcing function for
>> >> >> >> > decommitting
>> >> >> >> > spans
>> >> >> >> > that are in our free list.
>> >> >> >> > In each of the above cases, we need benchmark data on
>> user-class
>> >> >> >> > machines to
>> >> >> >> > show costs of these changes.  Until we understand the memory
>> >> >> >> > impact,
>> >> >> >> > we
>> >> >> >> > need
>> >> >> >> > to move forward conservatively in our action, and be vigilant
>> for
>> >> >> >> > thrashing
>> >> >> >> > scenarios.
>> >> >> >> >
>> >> >> >> > Comments??
>> >> >> >>
>> >> >> >> As a close attempt you may have a look at
>> >> >> >> http://codereview.chromium.org/256013/show
>> >> >> >>
>> >> >> >> That allows spans with a mix of committed/decommitted pages (but
>> >> >> >> only
>> >> >> >> in returned list) as committing seems to live fine if some pages
>> are
>> >> >> >> already committed.
>> >> >> >>
>> >> >> >> That has some minor performance benefit, but I didn't investigate
>> it
>> >> >> >> in details yet.
>> >> >> >>
>> >> >> >> just my 2 cents,
>> >> >> >> anton.
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>>
>
>

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: [email protected] 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

[chromium-dev] Re: [Memory] in TCMalloc, more careful handling of VirtualAlloc commit via SystemAlloc

Reply via email to