[chromium-dev] Re: [Memory] in TCMalloc, more careful handling of VirtualAlloc commit via SystemAlloc

James Robinson Wed, 30 Sep 2009 14:28:52 -0700

On Wed, Sep 30, 2009 at 11:29 AM, Anton Muhin <ant...@chromium.org> wrote:


> On Wed, Sep 30, 2009 at 10:27 PM, Mike Belshe <mbel...@google.com> wrote:
> >
> >
> > On Wed, Sep 30, 2009 at 11:24 AM, Anton Muhin <ant...@chromium.org>
> wrote:
> >>
> >> On Wed, Sep 30, 2009 at 10:17 PM, Mike Belshe <mbel...@google.com>
> wrote:
> >> > On Wed, Sep 30, 2009 at 11:05 AM, Anton Muhin <ant...@chromium.org>
> >> > wrote:
> >> >>
> >> >> On Wed, Sep 30, 2009 at 9:58 PM, Mike Belshe <mbel...@google.com>
> >> >> wrote:
> >> >> > On Wed, Sep 30, 2009 at 10:48 AM, Anton Muhin <ant...@google.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> On Wed, Sep 30, 2009 at 9:39 PM, Jim Roskind <j...@google.com>
> wrote:
> >> >> >> > If you're not interested in TCMalloc customization for Chromium,
> >> >> >> > you
> >> >> >> > should
> >> >> >> > stop reading now.
> >> >> >> > This post is meant to gather some discussion on a topic before I
> >> >> >> > code
> >> >> >> > and
> >> >> >> > land a change.
> >> >> >> > MOTIVATION
> >> >> >> > We believe poor memory utilization is at the heart of a lot of
> >> >> >> > jank
> >> >> >> > problems.  Such problems may be difficult to repro in short
> >> >> >> > controlled
> >> >> >> > benchmarks, but our users are telling us we have problems, so we
> >> >> >> > know
> >> >> >> > we
> >> >> >> > have problems.  As a result, we need to be more conservative in
> >> >> >> > memory
> >> >> >> > utilization and handling.
> >> >> >> > SUMMARY OF CHANGE
> >> >> >> > I'm thinking of changing our TCMalloc so that when a span is
> freed
> >> >> >> > into
> >> >> >> > TCMalloc's free list, and it gets coalesced with an adjacent
> span
> >> >> >> > that
> >> >> >> > is
> >> >> >> > already decommitted, that the coalesced span should be entirely
> >> >> >> > decommitted
> >> >> >> > (as opposed to our current customized performance of committing
> >> >> >> > the
> >> >> >> > entire
> >> >> >> > span).
> >> >> >> > This proposed policy was put in place previously by Mike, but
> >> >> >> > (reportedly)
> >> >> >> > caused a 3-5% perf regression in V8.  I believe AntonM changed
> >> >> >> > that
> >> >> >> > policy
> >> >> >> > to what we have currently, where always ensure full commitment
> of
> >> >> >> > a
> >> >> >> > coalesced span (regaining V8 performance on a benchmark).
> >> >> >>
> >> >> >> The immediate question and plea.  Question: how can we estimate
> >> >> >> performance implications of the change?  Yes, we have some
> internal
> >> >> >> benchmarks which could be used for that (they release memory
> >> >> >> heavily).
> >> >> >>  Anything else?
> >> >> >>
> >> >> >> Plea: please, do not regress DOM performance unless there are
> really
> >> >> >> compelling reasons.  And even in this case :)
> >> >> >
> >> >> > Anton -
> >> >> > All evidence from user complaints and bug reports are that Chrome
> >> >> > uses
> >> >> > too
> >> >> > much memory.  If you load Chrome on a 1GB system, you can feel it
> >> >> > yourself.
> >> >> >  Unfortunately, we have yet to build a reliable swapping benchmark.
> >> >> >  By
> >> >> > allowing tcmalloc to accumulate large chunks of unused pages, we
> >> >> > increase
> >> >> > the chance that paging will occur on the system.  But because
> paging
> >> >> > is
> >> >> > a
> >> >> > system-wide activity, it can hit our various processes in
> >> >> > unpredictable
> >> >> > ways
> >> >> > - and this leads to jank.  I think the jank is worse than the
> >> >> > benchmark
> >> >> > win.
> >> >> > I wish we had a better way to quantify the damage caused by paging.
> >> >> >  Jim
> >> >> > and
> >> >> > others are working on that.
> >> >> > But it's clear to me that we're just being a memory pig for what is
> >> >> > really a
> >> >> > modest gain on a semi-obscure benchmark right now.  Using the
> current
> >> >> > algorithms, we have literally multi-hundred megabyte memory usage
> >> >> > swings
> >> >> > in
> >> >> > exchange for 3% on a benchmark.  Don't you agree this is the wrong
> >> >> > tradeoff?
> >> >> >  (DOM benchmark grows to 500+MB right now; when you switch tabs it
> >> >> > drops
> >> >> > to
> >> >> > <100MB).  Other pages have been witnessed which have similar
> behavior
> >> >> > (loading the histograms page).
> >> >> > We may be able to put in some algorithms which are more aware of
> the
> >> >> > current
> >> >> > available memory going forward, but I agree with Jim that there
> will
> >> >> > be
> >> >> > a
> >> >> > lot of negative effects as long as we continue to have such large
> >> >> > memory
> >> >> > swings.
> >> >>
> >> >> Mike, I am completely agree that we should reduce memory usage.  On
> >> >> the other hand speed was always one of Chrome trademarks.  My feeling
> >> >> is more committed pages in free list make us faster (but yes, there
> is
> >> >> paging etc.).  That's exactly the reason I asked for some way to
> >> >> quantify quality of different approaches, esp. given classic memory
> >> >> vs. speed dilemma, ideally (imho) both speed and memory usage should
> >> >> be considered.
> >> >
> >> > The team is working on benchmarks.
> >> > I think the evidence of paging is pretty overwhelming.
> >> > Paging and jank is far worse than the small perf boost on dom node
> >> > creation.
> >> >  I don't believe the benchmark in question is a significant driver of
> >> > primary performance.  Do you?
> >>
> >> To some extent.  Just to make it clear: I am not insisting, if
> >> consensus is we should trade performance in DOM for reduced memory
> >> usage in this case, that's fine.  I only want to have real numbers
> >> before we make any decision.
> >>
> >> @pkasting: it wasn't 3%, it was (closer to 8% if memory serves).
> >
> > When I checked it in my records show a 217 -> 210 benchmark drop, which
> is
> > 3%.
>
> My numbers were substantially bigger, but anyway we need to remeasure
> it---there are too many factors.
>

I did some measurements on my windows machine between the current behavior
(always commit spans when merging them together) with a very conservative
alternative (always decommit spans on ::Delete, including the just released
one).  The interesting bits are the benchmark scores and memory use at the
end of the run.

For the DOM benchmark, the score regressed from an average over 4 runs of
188.25 to 185 which is <2%.  The peak memory is about the same but the
memory committed by the tab at the end of the run decreased from an average
of 642MB to 57MB which is a 91% reduction.  4 runs probably isn't enough to
make a definitive statement about the perf impact but I think the memory
impact is pretty clear.  The memory characteristics of the V8 benchmark was
unchanged but the performance dropped from an average of 3009 to 2944, which
is about 2%.  Sunspider did not change at all in either memory or
performance.

- James

>
> yours,
> anton.
>
> >>
> >> And forgotten.  Regarding the policy to decommit spans in ::Delete.
> >> Please, correct me if I'm wrong, but doesn't that actually would make
> >> all the free spans decommitted---the span would be only committed when
> >> it gets allocated, no?  Decommitting only if any of adjacent spans is
> >> decommitted may keep some spans committed, but it's difficult for me
> >> to say how often.
> >
> > Oh - more work is still needed, yes :-)
> >
> > Mike
> >
> >>
> >> yours,
> >> anton.
> >>
> >> > Mike
> >> >
> >> >>
> >> >> yours,
> >> >> anton.
> >> >>
> >> >> > Mike
> >> >> >
> >> >> >
> >> >> >
> >> >> >>
> >> >> >> > WHY CHANGE?
> >> >> >> > The problematic scenario I'm anticipating (and may currently be
> >> >> >> > burning
> >> >> >> > us)
> >> >> >> > is:
> >> >> >> > a) A (renderer) process allocates a lot of memory, and achieves
> a
> >> >> >> > significant high water mark of memory used.
> >> >> >> > b) The process deallocates a lot of memory, and it flows into
> the
> >> >> >> > TCMalloc
> >> >> >> > free list. [We still have a lot of memory attributed to that
> >> >> >> > process,
> >> >> >> > and
> >> >> >> > the app as a whole shows as using that memory.]
> >> >> >> > c) We eventually decide to decommit a lot of our free memory.
> >> >> >> >  Currently
> >> >> >> > this happens when we switch away from a tab. [This saves us from
> >> >> >> > further
> >> >> >> > swapping out the unused memory].
> >> >> >> > Now comes the evil problem.
> >> >> >> > d) We return to the tab which has a giant free list of spans,
> most
> >> >> >> > of
> >> >> >> > which
> >> >> >> > are decommitted.  [The good news is that the memory is still
> >> >> >> >  decommitted]
> >> >> >> > e) We allocate  a block of memory, such as 32k chunk.  This
> memory
> >> >> >> > is
> >> >> >> > pulled
> >> >> >> > from a decommitted span, and ONLY the allocated chunk is
> >> >> >> > committed.
> >> >> >> > [That
> >> >> >> > sounds good]
> >> >> >> > f) We free the block of memory from (e).  What ever span is
> >> >> >> > adjacent
> >> >> >> > to
> >> >> >> > that
> >> >> >> > block is committed <potential oops>.  Hence, if we he took (e)
> >> >> >> > from a
> >> >> >> > 200Meg
> >> >> >> > span, the act of freeing (e) will cause a 200Meg commitment!?!
> >> >> >> >  This
> >> >> >> > in
> >> >> >> > turn
> >> >> >> > would not only require touching (and having VirtualAlloc clear
> to
> >> >> >> > zero)
> >> >> >> > all
> >> >> >> > allocated memory in the large span, it will also immediately put
> >> >> >> > memory
> >> >> >> > pressure on the OS, and force as much as 200Megs of other apps
> to
> >> >> >> > be
> >> >> >> > swapped
> >> >> >> > out to disk :-(.
> >> >> >>
> >> >> >> I'm not sure about swapping unless you touch those now committed
> >> >> >> pages, but only experiment will tell.
> >> >> >>
> >> >> >> > I'm wary that our recent fix that allows spans to be (correctly)
> >> >> >> > coalesced
> >> >> >> > independent of their size should cause it to be easier to
> coalesce
> >> >> >> > spans.
> >> >> >> >  Worse yet, as we proceed to further optimize TCMalloc, one
> >> >> >> > measure
> >> >> >> > of
> >> >> >> > success will be that the list of spans will be fragmented less
> and
> >> >> >> > less,
> >> >> >> > and
> >> >> >> > we'll have larger and larger coalesced singular spans.  Any
> large
> >> >> >> > "reserved"
> >> >> >> > but not "commited" span will be a jank time-bomb waiting to blow
> >> >> >> > up
> >> >> >> > if
> >> >> >> > the
> >> >> >> > process every allocates/frees from such a large span :-(.
> >> >> >> >
> >> >> >> > WHAT IS THE PLAN GOING FORWARD (or how can we do better, and
> >> >> >> > regain
> >> >> >> > performance, etc.)
> >> >> >> > We have at least the following plausible alternative ways to
> move
> >> >> >> > forward
> >> >> >> > with TCMalloc.  The overall goal is to avoid wasteful decommits,
> >> >> >> > and
> >> >> >> > at
> >> >> >> > the
> >> >> >> > same time avoid heap-wide flailing between minimal and maximal
> >> >> >> > span
> >> >> >> > commitment states.
> >> >> >> > Each free-span is currently the maximal contiguous region of
> >> >> >> > memory
> >> >> >> > that
> >> >> >> > TCMalloc is controlling, but has been deallocated.  Currently
> >> >> >> > spans
> >> >> >> > have
> >> >> >> > to
> >> >> >> > be totally committed, or totally decommitted.  There is no
> mixture
> >> >> >> > supported.
> >> >> >> > a) We could re-architect the span handling to allow spans to be
> >> >> >> > combinations
> >> >> >> > of committed and decommitted regions.
> >> >> >> > b) We could vary out policy on what to do with a coalesced span,
> >> >> >> > based
> >> >> >> > on
> >> >> >> > span size and memory pressure.  For example: We can consistently
> >> >> >> > monitor
> >> >> >> > the
> >> >> >> > in-use vs free (but committed) ratio.  We can try to stay in
> some
> >> >> >> > "acceptable" region by varying our policy.
> >> >> >> > c) We could actually return to the OS some portions of spans
> that
> >> >> >> > we
> >> >> >> > have
> >> >> >> > decommitted.  We could then let the OS give us back these
> regions
> >> >> >> > if
> >> >> >> > we
> >> >> >> > need
> >> >> >> > memory.  Until we get them back, we would not be at risk of
> doing
> >> >> >> > unnecessary commits.  Decisions about when to return to the OS
> can
> >> >> >> > be
> >> >> >> > made
> >> >> >> > based on span size and memory pressure.
> >> >> >> > d) We can change the interval and forcing function for
> >> >> >> > decommitting
> >> >> >> > spans
> >> >> >> > that are in our free list.
> >> >> >> > In each of the above cases, we need benchmark data on user-class
> >> >> >> > machines to
> >> >> >> > show costs of these changes.  Until we understand the memory
> >> >> >> > impact,
> >> >> >> > we
> >> >> >> > need
> >> >> >> > to move forward conservatively in our action, and be vigilant
> for
> >> >> >> > thrashing
> >> >> >> > scenarios.
> >> >> >> >
> >> >> >> > Comments??
> >> >> >>
> >> >> >> As a close attempt you may have a look at
> >> >> >> http://codereview.chromium.org/256013/show
> >> >> >>
> >> >> >> That allows spans with a mix of committed/decommitted pages (but
> >> >> >> only
> >> >> >> in returned list) as committing seems to live fine if some pages
> are
> >> >> >> already committed.
> >> >> >>
> >> >> >> That has some minor performance benefit, but I didn't investigate
> it
> >> >> >> in details yet.
> >> >> >>
> >> >> >> just my 2 cents,
> >> >> >> anton.
> >> >> >
> >> >> >
> >> >
> >> >
> >
> >
>

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

[chromium-dev] Re: [Memory] in TCMalloc, more careful handling of VirtualAlloc commit via SystemAlloc

Reply via email to