[chromium-dev] Re: [Memory] in TCMalloc, more careful handling of VirtualAlloc commit via SystemAlloc

Anton Muhin Wed, 30 Sep 2009 11:29:56 -0700

On Wed, Sep 30, 2009 at 10:27 PM, Mike Belshe <[email protected]> wrote:
>
>
> On Wed, Sep 30, 2009 at 11:24 AM, Anton Muhin <[email protected]> wrote:
>>
>> On Wed, Sep 30, 2009 at 10:17 PM, Mike Belshe <[email protected]> wrote:
>> > On Wed, Sep 30, 2009 at 11:05 AM, Anton Muhin <[email protected]>
>> > wrote:
>> >>
>> >> On Wed, Sep 30, 2009 at 9:58 PM, Mike Belshe <[email protected]>
>> >> wrote:
>> >> > On Wed, Sep 30, 2009 at 10:48 AM, Anton Muhin <[email protected]>
>> >> > wrote:
>> >> >>
>> >> >> On Wed, Sep 30, 2009 at 9:39 PM, Jim Roskind <[email protected]> wrote:
>> >> >> > If you're not interested in TCMalloc customization for Chromium,
>> >> >> > you
>> >> >> > should
>> >> >> > stop reading now.
>> >> >> > This post is meant to gather some discussion on a topic before I
>> >> >> > code
>> >> >> > and
>> >> >> > land a change.
>> >> >> > MOTIVATION
>> >> >> > We believe poor memory utilization is at the heart of a lot of
>> >> >> > jank
>> >> >> > problems.  Such problems may be difficult to repro in short
>> >> >> > controlled
>> >> >> > benchmarks, but our users are telling us we have problems, so we
>> >> >> > know
>> >> >> > we
>> >> >> > have problems.  As a result, we need to be more conservative in
>> >> >> > memory
>> >> >> > utilization and handling.
>> >> >> > SUMMARY OF CHANGE
>> >> >> > I'm thinking of changing our TCMalloc so that when a span is freed
>> >> >> > into
>> >> >> > TCMalloc's free list, and it gets coalesced with an adjacent span
>> >> >> > that
>> >> >> > is
>> >> >> > already decommitted, that the coalesced span should be entirely
>> >> >> > decommitted
>> >> >> > (as opposed to our current customized performance of committing
>> >> >> > the
>> >> >> > entire
>> >> >> > span).
>> >> >> > This proposed policy was put in place previously by Mike, but
>> >> >> > (reportedly)
>> >> >> > caused a 3-5% perf regression in V8.  I believe AntonM changed
>> >> >> > that
>> >> >> > policy
>> >> >> > to what we have currently, where always ensure full commitment of
>> >> >> > a
>> >> >> > coalesced span (regaining V8 performance on a benchmark).
>> >> >>
>> >> >> The immediate question and plea.  Question: how can we estimate
>> >> >> performance implications of the change?  Yes, we have some internal
>> >> >> benchmarks which could be used for that (they release memory
>> >> >> heavily).
>> >> >>  Anything else?
>> >> >>
>> >> >> Plea: please, do not regress DOM performance unless there are really
>> >> >> compelling reasons.  And even in this case :)
>> >> >
>> >> > Anton -
>> >> > All evidence from user complaints and bug reports are that Chrome
>> >> > uses
>> >> > too
>> >> > much memory.  If you load Chrome on a 1GB system, you can feel it
>> >> > yourself.
>> >> >  Unfortunately, we have yet to build a reliable swapping benchmark.
>> >> >  By
>> >> > allowing tcmalloc to accumulate large chunks of unused pages, we
>> >> > increase
>> >> > the chance that paging will occur on the system.  But because paging
>> >> > is
>> >> > a
>> >> > system-wide activity, it can hit our various processes in
>> >> > unpredictable
>> >> > ways
>> >> > - and this leads to jank.  I think the jank is worse than the
>> >> > benchmark
>> >> > win.
>> >> > I wish we had a better way to quantify the damage caused by paging.
>> >> >  Jim
>> >> > and
>> >> > others are working on that.
>> >> > But it's clear to me that we're just being a memory pig for what is
>> >> > really a
>> >> > modest gain on a semi-obscure benchmark right now.  Using the current
>> >> > algorithms, we have literally multi-hundred megabyte memory usage
>> >> > swings
>> >> > in
>> >> > exchange for 3% on a benchmark.  Don't you agree this is the wrong
>> >> > tradeoff?
>> >> >  (DOM benchmark grows to 500+MB right now; when you switch tabs it
>> >> > drops
>> >> > to
>> >> > <100MB).  Other pages have been witnessed which have similar behavior
>> >> > (loading the histograms page).
>> >> > We may be able to put in some algorithms which are more aware of the
>> >> > current
>> >> > available memory going forward, but I agree with Jim that there will
>> >> > be
>> >> > a
>> >> > lot of negative effects as long as we continue to have such large
>> >> > memory
>> >> > swings.
>> >>
>> >> Mike, I am completely agree that we should reduce memory usage.  On
>> >> the other hand speed was always one of Chrome trademarks.  My feeling
>> >> is more committed pages in free list make us faster (but yes, there is
>> >> paging etc.).  That's exactly the reason I asked for some way to
>> >> quantify quality of different approaches, esp. given classic memory
>> >> vs. speed dilemma, ideally (imho) both speed and memory usage should
>> >> be considered.
>> >
>> > The team is working on benchmarks.
>> > I think the evidence of paging is pretty overwhelming.
>> > Paging and jank is far worse than the small perf boost on dom node
>> > creation.
>> >  I don't believe the benchmark in question is a significant driver of
>> > primary performance.  Do you?
>>
>> To some extent.  Just to make it clear: I am not insisting, if
>> consensus is we should trade performance in DOM for reduced memory
>> usage in this case, that's fine.  I only want to have real numbers
>> before we make any decision.
>>
>> @pkasting: it wasn't 3%, it was (closer to 8% if memory serves).
>
> When I checked it in my records show a 217 -> 210 benchmark drop, which is
> 3%.


My numbers were substantially bigger, but anyway we need to remeasure
it---there are too many factors.

yours,
anton.

>>
>> And forgotten.  Regarding the policy to decommit spans in ::Delete.
>> Please, correct me if I'm wrong, but doesn't that actually would make
>> all the free spans decommitted---the span would be only committed when
>> it gets allocated, no?  Decommitting only if any of adjacent spans is
>> decommitted may keep some spans committed, but it's difficult for me
>> to say how often.
>
> Oh - more work is still needed, yes :-)
>
> Mike
>
>>
>> yours,
>> anton.
>>
>> > Mike
>> >
>> >>
>> >> yours,
>> >> anton.
>> >>
>> >> > Mike
>> >> >
>> >> >
>> >> >
>> >> >>
>> >> >> > WHY CHANGE?
>> >> >> > The problematic scenario I'm anticipating (and may currently be
>> >> >> > burning
>> >> >> > us)
>> >> >> > is:
>> >> >> > a) A (renderer) process allocates a lot of memory, and achieves a
>> >> >> > significant high water mark of memory used.
>> >> >> > b) The process deallocates a lot of memory, and it flows into the
>> >> >> > TCMalloc
>> >> >> > free list. [We still have a lot of memory attributed to that
>> >> >> > process,
>> >> >> > and
>> >> >> > the app as a whole shows as using that memory.]
>> >> >> > c) We eventually decide to decommit a lot of our free memory.
>> >> >> >  Currently
>> >> >> > this happens when we switch away from a tab. [This saves us from
>> >> >> > further
>> >> >> > swapping out the unused memory].
>> >> >> > Now comes the evil problem.
>> >> >> > d) We return to the tab which has a giant free list of spans, most
>> >> >> > of
>> >> >> > which
>> >> >> > are decommitted.  [The good news is that the memory is still
>> >> >> >  decommitted]
>> >> >> > e) We allocate  a block of memory, such as 32k chunk.  This memory
>> >> >> > is
>> >> >> > pulled
>> >> >> > from a decommitted span, and ONLY the allocated chunk is
>> >> >> > committed.
>> >> >> > [That
>> >> >> > sounds good]
>> >> >> > f) We free the block of memory from (e).  What ever span is
>> >> >> > adjacent
>> >> >> > to
>> >> >> > that
>> >> >> > block is committed <potential oops>.  Hence, if we he took (e)
>> >> >> > from a
>> >> >> > 200Meg
>> >> >> > span, the act of freeing (e) will cause a 200Meg commitment!?!
>> >> >> >  This
>> >> >> > in
>> >> >> > turn
>> >> >> > would not only require touching (and having VirtualAlloc clear to
>> >> >> > zero)
>> >> >> > all
>> >> >> > allocated memory in the large span, it will also immediately put
>> >> >> > memory
>> >> >> > pressure on the OS, and force as much as 200Megs of other apps to
>> >> >> > be
>> >> >> > swapped
>> >> >> > out to disk :-(.
>> >> >>
>> >> >> I'm not sure about swapping unless you touch those now committed
>> >> >> pages, but only experiment will tell.
>> >> >>
>> >> >> > I'm wary that our recent fix that allows spans to be (correctly)
>> >> >> > coalesced
>> >> >> > independent of their size should cause it to be easier to coalesce
>> >> >> > spans.
>> >> >> >  Worse yet, as we proceed to further optimize TCMalloc, one
>> >> >> > measure
>> >> >> > of
>> >> >> > success will be that the list of spans will be fragmented less and
>> >> >> > less,
>> >> >> > and
>> >> >> > we'll have larger and larger coalesced singular spans.  Any large
>> >> >> > "reserved"
>> >> >> > but not "commited" span will be a jank time-bomb waiting to blow
>> >> >> > up
>> >> >> > if
>> >> >> > the
>> >> >> > process every allocates/frees from such a large span :-(.
>> >> >> >
>> >> >> > WHAT IS THE PLAN GOING FORWARD (or how can we do better, and
>> >> >> > regain
>> >> >> > performance, etc.)
>> >> >> > We have at least the following plausible alternative ways to move
>> >> >> > forward
>> >> >> > with TCMalloc.  The overall goal is to avoid wasteful decommits,
>> >> >> > and
>> >> >> > at
>> >> >> > the
>> >> >> > same time avoid heap-wide flailing between minimal and maximal
>> >> >> > span
>> >> >> > commitment states.
>> >> >> > Each free-span is currently the maximal contiguous region of
>> >> >> > memory
>> >> >> > that
>> >> >> > TCMalloc is controlling, but has been deallocated.  Currently
>> >> >> > spans
>> >> >> > have
>> >> >> > to
>> >> >> > be totally committed, or totally decommitted.  There is no mixture
>> >> >> > supported.
>> >> >> > a) We could re-architect the span handling to allow spans to be
>> >> >> > combinations
>> >> >> > of committed and decommitted regions.
>> >> >> > b) We could vary out policy on what to do with a coalesced span,
>> >> >> > based
>> >> >> > on
>> >> >> > span size and memory pressure.  For example: We can consistently
>> >> >> > monitor
>> >> >> > the
>> >> >> > in-use vs free (but committed) ratio.  We can try to stay in some
>> >> >> > "acceptable" region by varying our policy.
>> >> >> > c) We could actually return to the OS some portions of spans that
>> >> >> > we
>> >> >> > have
>> >> >> > decommitted.  We could then let the OS give us back these regions
>> >> >> > if
>> >> >> > we
>> >> >> > need
>> >> >> > memory.  Until we get them back, we would not be at risk of doing
>> >> >> > unnecessary commits.  Decisions about when to return to the OS can
>> >> >> > be
>> >> >> > made
>> >> >> > based on span size and memory pressure.
>> >> >> > d) We can change the interval and forcing function for
>> >> >> > decommitting
>> >> >> > spans
>> >> >> > that are in our free list.
>> >> >> > In each of the above cases, we need benchmark data on user-class
>> >> >> > machines to
>> >> >> > show costs of these changes.  Until we understand the memory
>> >> >> > impact,
>> >> >> > we
>> >> >> > need
>> >> >> > to move forward conservatively in our action, and be vigilant for
>> >> >> > thrashing
>> >> >> > scenarios.
>> >> >> >
>> >> >> > Comments??
>> >> >>
>> >> >> As a close attempt you may have a look at
>> >> >> http://codereview.chromium.org/256013/show
>> >> >>
>> >> >> That allows spans with a mix of committed/decommitted pages (but
>> >> >> only
>> >> >> in returned list) as committing seems to live fine if some pages are
>> >> >> already committed.
>> >> >>
>> >> >> That has some minor performance benefit, but I didn't investigate it
>> >> >> in details yet.
>> >> >>
>> >> >> just my 2 cents,
>> >> >> anton.
>> >> >
>> >> >
>> >
>> >
>
>

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: [email protected] 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

[chromium-dev] Re: [Memory] in TCMalloc, more careful handling of VirtualAlloc commit via SystemAlloc

Reply via email to