On Wed, Sep 30, 2009 at 11:05 AM, Anton Muhin <[email protected]> wrote:

> On Wed, Sep 30, 2009 at 9:58 PM, Mike Belshe <[email protected]> wrote:
> > On Wed, Sep 30, 2009 at 10:48 AM, Anton Muhin <[email protected]> wrote:
> >>
> >> On Wed, Sep 30, 2009 at 9:39 PM, Jim Roskind <[email protected]> wrote:
> >> > If you're not interested in TCMalloc customization for Chromium, you
> >> > should
> >> > stop reading now.
> >> > This post is meant to gather some discussion on a topic before I code
> >> > and
> >> > land a change.
> >> > MOTIVATION
> >> > We believe poor memory utilization is at the heart of a lot of jank
> >> > problems.  Such problems may be difficult to repro in short controlled
> >> > benchmarks, but our users are telling us we have problems, so we know
> we
> >> > have problems.  As a result, we need to be more conservative in memory
> >> > utilization and handling.
> >> > SUMMARY OF CHANGE
> >> > I'm thinking of changing our TCMalloc so that when a span is freed
> into
> >> > TCMalloc's free list, and it gets coalesced with an adjacent span that
> >> > is
> >> > already decommitted, that the coalesced span should be entirely
> >> > decommitted
> >> > (as opposed to our current customized performance of committing the
> >> > entire
> >> > span).
> >> > This proposed policy was put in place previously by Mike, but
> >> > (reportedly)
> >> > caused a 3-5% perf regression in V8.  I believe AntonM changed that
> >> > policy
> >> > to what we have currently, where always ensure full commitment of a
> >> > coalesced span (regaining V8 performance on a benchmark).
> >>
> >> The immediate question and plea.  Question: how can we estimate
> >> performance implications of the change?  Yes, we have some internal
> >> benchmarks which could be used for that (they release memory heavily).
> >>  Anything else?
> >>
> >> Plea: please, do not regress DOM performance unless there are really
> >> compelling reasons.  And even in this case :)
> >
> > Anton -
> > All evidence from user complaints and bug reports are that Chrome uses
> too
> > much memory.  If you load Chrome on a 1GB system, you can feel it
> yourself.
> >  Unfortunately, we have yet to build a reliable swapping benchmark.  By
> > allowing tcmalloc to accumulate large chunks of unused pages, we increase
> > the chance that paging will occur on the system.  But because paging is a
> > system-wide activity, it can hit our various processes in unpredictable
> ways
> > - and this leads to jank.  I think the jank is worse than the benchmark
> win.
> > I wish we had a better way to quantify the damage caused by paging.  Jim
> and
> > others are working on that.
> > But it's clear to me that we're just being a memory pig for what is
> really a
> > modest gain on a semi-obscure benchmark right now.  Using the current
> > algorithms, we have literally multi-hundred megabyte memory usage swings
> in
> > exchange for 3% on a benchmark.  Don't you agree this is the wrong
> tradeoff?
> >  (DOM benchmark grows to 500+MB right now; when you switch tabs it drops
> to
> > <100MB).  Other pages have been witnessed which have similar behavior
> > (loading the histograms page).
> > We may be able to put in some algorithms which are more aware of the
> current
> > available memory going forward, but I agree with Jim that there will be a
> > lot of negative effects as long as we continue to have such large memory
> > swings.
>
> Mike, I am completely agree that we should reduce memory usage.  On
> the other hand speed was always one of Chrome trademarks.  My feeling
> is more committed pages in free list make us faster (but yes, there is
> paging etc.).  That's exactly the reason I asked for some way to
> quantify quality of different approaches, esp. given classic memory
> vs. speed dilemma, ideally (imho) both speed and memory usage should
> be considered.
>

The team is working on benchmarks.

I think the evidence of paging is pretty overwhelming.

Paging and jank is far worse than the small perf boost on dom node creation.
 I don't believe the benchmark in question is a significant driver of
primary performance.  Do you?

Mike


>
> yours,
> anton.
>
> > Mike
> >
> >
> >
> >>
> >> > WHY CHANGE?
> >> > The problematic scenario I'm anticipating (and may currently be
> burning
> >> > us)
> >> > is:
> >> > a) A (renderer) process allocates a lot of memory, and achieves a
> >> > significant high water mark of memory used.
> >> > b) The process deallocates a lot of memory, and it flows into the
> >> > TCMalloc
> >> > free list. [We still have a lot of memory attributed to that process,
> >> > and
> >> > the app as a whole shows as using that memory.]
> >> > c) We eventually decide to decommit a lot of our free memory.
>  Currently
> >> > this happens when we switch away from a tab. [This saves us from
> further
> >> > swapping out the unused memory].
> >> > Now comes the evil problem.
> >> > d) We return to the tab which has a giant free list of spans, most of
> >> > which
> >> > are decommitted.  [The good news is that the memory is still
> >> >  decommitted]
> >> > e) We allocate  a block of memory, such as 32k chunk.  This memory is
> >> > pulled
> >> > from a decommitted span, and ONLY the allocated chunk is committed.
> >> > [That
> >> > sounds good]
> >> > f) We free the block of memory from (e).  What ever span is adjacent
> to
> >> > that
> >> > block is committed <potential oops>.  Hence, if we he took (e) from a
> >> > 200Meg
> >> > span, the act of freeing (e) will cause a 200Meg commitment!?!  This
> in
> >> > turn
> >> > would not only require touching (and having VirtualAlloc clear to
> zero)
> >> > all
> >> > allocated memory in the large span, it will also immediately put
> memory
> >> > pressure on the OS, and force as much as 200Megs of other apps to be
> >> > swapped
> >> > out to disk :-(.
> >>
> >> I'm not sure about swapping unless you touch those now committed
> >> pages, but only experiment will tell.
> >>
> >> > I'm wary that our recent fix that allows spans to be (correctly)
> >> > coalesced
> >> > independent of their size should cause it to be easier to coalesce
> >> > spans.
> >> >  Worse yet, as we proceed to further optimize TCMalloc, one measure of
> >> > success will be that the list of spans will be fragmented less and
> less,
> >> > and
> >> > we'll have larger and larger coalesced singular spans.  Any large
> >> > "reserved"
> >> > but not "commited" span will be a jank time-bomb waiting to blow up if
> >> > the
> >> > process every allocates/frees from such a large span :-(.
> >> >
> >> > WHAT IS THE PLAN GOING FORWARD (or how can we do better, and regain
> >> > performance, etc.)
> >> > We have at least the following plausible alternative ways to move
> >> > forward
> >> > with TCMalloc.  The overall goal is to avoid wasteful decommits, and
> at
> >> > the
> >> > same time avoid heap-wide flailing between minimal and maximal span
> >> > commitment states.
> >> > Each free-span is currently the maximal contiguous region of memory
> that
> >> > TCMalloc is controlling, but has been deallocated.  Currently spans
> have
> >> > to
> >> > be totally committed, or totally decommitted.  There is no mixture
> >> > supported.
> >> > a) We could re-architect the span handling to allow spans to be
> >> > combinations
> >> > of committed and decommitted regions.
> >> > b) We could vary out policy on what to do with a coalesced span, based
> >> > on
> >> > span size and memory pressure.  For example: We can consistently
> monitor
> >> > the
> >> > in-use vs free (but committed) ratio.  We can try to stay in some
> >> > "acceptable" region by varying our policy.
> >> > c) We could actually return to the OS some portions of spans that we
> >> > have
> >> > decommitted.  We could then let the OS give us back these regions if
> we
> >> > need
> >> > memory.  Until we get them back, we would not be at risk of doing
> >> > unnecessary commits.  Decisions about when to return to the OS can be
> >> > made
> >> > based on span size and memory pressure.
> >> > d) We can change the interval and forcing function for decommitting
> >> > spans
> >> > that are in our free list.
> >> > In each of the above cases, we need benchmark data on user-class
> >> > machines to
> >> > show costs of these changes.  Until we understand the memory impact,
> we
> >> > need
> >> > to move forward conservatively in our action, and be vigilant for
> >> > thrashing
> >> > scenarios.
> >> >
> >> > Comments??
> >>
> >> As a close attempt you may have a look at
> >> http://codereview.chromium.org/256013/show
> >>
> >> That allows spans with a mix of committed/decommitted pages (but only
> >> in returned list) as committing seems to live fine if some pages are
> >> already committed.
> >>
> >> That has some minor performance benefit, but I didn't investigate it
> >> in details yet.
> >>
> >> just my 2 cents,
> >> anton.
> >
> >
>

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: [email protected] 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

Reply via email to