sorry once again---once again wrong account.

On Thu, Oct 1, 2009 at 12:36 AM, Anton Muhin <[email protected]> wrote:
> On Wed, Sep 30, 2009 at 10:22 PM, James Robinson <[email protected]> wrote:
>>
>>
>> On Wed, Sep 30, 2009 at 11:17 AM, Mike Belshe <[email protected]> wrote:
>>>
>>> On Wed, Sep 30, 2009 at 11:05 AM, Anton Muhin <[email protected]> wrote:
>>>>
>>>> On Wed, Sep 30, 2009 at 9:58 PM, Mike Belshe <[email protected]> wrote:
>>>> > On Wed, Sep 30, 2009 at 10:48 AM, Anton Muhin <[email protected]>
>>>> > wrote:
>>>> >>
>>>> >> On Wed, Sep 30, 2009 at 9:39 PM, Jim Roskind <[email protected]> wrote:
>>>> >> > If you're not interested in TCMalloc customization for Chromium, you
>>>> >> > should
>>>> >> > stop reading now.
>>>> >> > This post is meant to gather some discussion on a topic before I
>>>> >> > code
>>>> >> > and
>>>> >> > land a change.
>>>> >> > MOTIVATION
>>>> >> > We believe poor memory utilization is at the heart of a lot of jank
>>>> >> > problems.  Such problems may be difficult to repro in short
>>>> >> > controlled
>>>> >> > benchmarks, but our users are telling us we have problems, so we
>>>> >> > know we
>>>> >> > have problems.  As a result, we need to be more conservative in
>>>> >> > memory
>>>> >> > utilization and handling.
>>>> >> > SUMMARY OF CHANGE
>>>> >> > I'm thinking of changing our TCMalloc so that when a span is freed
>>>> >> > into
>>>> >> > TCMalloc's free list, and it gets coalesced with an adjacent span
>>>> >> > that
>>>> >> > is
>>>> >> > already decommitted, that the coalesced span should be entirely
>>>> >> > decommitted
>>>> >> > (as opposed to our current customized performance of committing the
>>>> >> > entire
>>>> >> > span).
>>>> >> > This proposed policy was put in place previously by Mike, but
>>>> >> > (reportedly)
>>>> >> > caused a 3-5% perf regression in V8.  I believe AntonM changed that
>>>> >> > policy
>>>> >> > to what we have currently, where always ensure full commitment of a
>>>> >> > coalesced span (regaining V8 performance on a benchmark).
>>>> >>
>>>> >> The immediate question and plea.  Question: how can we estimate
>>>> >> performance implications of the change?  Yes, we have some internal
>>>> >> benchmarks which could be used for that (they release memory heavily).
>>>> >>  Anything else?
>>>> >>
>>>> >> Plea: please, do not regress DOM performance unless there are really
>>>> >> compelling reasons.  And even in this case :)
>>>> >
>>>> > Anton -
>>>> > All evidence from user complaints and bug reports are that Chrome uses
>>>> > too
>>>> > much memory.  If you load Chrome on a 1GB system, you can feel it
>>>> > yourself.
>>>> >  Unfortunately, we have yet to build a reliable swapping benchmark.  By
>>>> > allowing tcmalloc to accumulate large chunks of unused pages, we
>>>> > increase
>>>> > the chance that paging will occur on the system.  But because paging is
>>>> > a
>>>> > system-wide activity, it can hit our various processes in unpredictable
>>>> > ways
>>>> > - and this leads to jank.  I think the jank is worse than the benchmark
>>>> > win.
>>>> > I wish we had a better way to quantify the damage caused by paging.
>>>> >  Jim and
>>>> > others are working on that.
>>>> > But it's clear to me that we're just being a memory pig for what is
>>>> > really a
>>>> > modest gain on a semi-obscure benchmark right now.  Using the current
>>>> > algorithms, we have literally multi-hundred megabyte memory usage
>>>> > swings in
>>>> > exchange for 3% on a benchmark.  Don't you agree this is the wrong
>>>> > tradeoff?
>>>> >  (DOM benchmark grows to 500+MB right now; when you switch tabs it
>>>> > drops to
>>>> > <100MB).  Other pages have been witnessed which have similar behavior
>>>> > (loading the histograms page).
>>>> > We may be able to put in some algorithms which are more aware of the
>>>> > current
>>>> > available memory going forward, but I agree with Jim that there will be
>>>> > a
>>>> > lot of negative effects as long as we continue to have such large
>>>> > memory
>>>> > swings.
>>>>
>>>> Mike, I am completely agree that we should reduce memory usage.  On
>>>> the other hand speed was always one of Chrome trademarks.  My feeling
>>>> is more committed pages in free list make us faster (but yes, there is
>>>> paging etc.).  That's exactly the reason I asked for some way to
>>>> quantify quality of different approaches, esp. given classic memory
>>>> vs. speed dilemma, ideally (imho) both speed and memory usage should
>>>> be considered.
>>>
>>> The team is working on benchmarks.
>>> I think the evidence of paging is pretty overwhelming.
>>> Paging and jank is far worse than the small perf boost on dom node
>>> creation.  I don't believe the benchmark in question is a significant driver
>>> of primary performance.  Do you?
>>
>> I agree completely that this seems to be an issue Here's what about:tcmalloc
>> says about my browser process right now (which is at around 267MB according
>> to the app's Task Manager):
>>
>> MALLOC:    207097856 (  197.5 MB) Heap size
>> MALLOC:     12494760 (   11.9 MB) Bytes in use by application
>> MALLOC:    188563456 (  179.8 MB) Bytes free in page heap
>>
>> Seems like just a little bit too much memory is being committed for 12MB of
>> live objects.
>> It might be possible to have it both ways by keeping a small 'buffer' of
>> committed pages rather than always committing or decommitting everything in
>> the free lists.  If we kept a limited size buffer around of committed pages
>> just for 'new' allocations but tried to decommit everything past the buffer
>> it should be possible to keep allocations fast without blowing through tons
>> of memory.  I'm going to try to experiment with this a bit and see if it
>> looks promising.
>
> That is one of several heuristics implemented in JSC's version of
> tcmalloc (see http://trac.webkit.org/changeset/46511 which was
> massaged and brought into tcmalloc).  I was originally thinking that
> IdleHandler is invoked when thread is idle which would allow to
> decommit committed pages, then I learned it false.
>
> I'd be curious to see if periodic (JSC's style) or idle scavenging
> could satisfy both allocation peaks and lowered memory usage.
>
> yours,
> anton.
>
>> - James
>>
>>>
>>> Mike
>>>
>>>>
>>>> yours,
>>>> anton.
>>>>
>>>> > Mike
>>>> >
>>>> >
>>>> >
>>>> >>
>>>> >> > WHY CHANGE?
>>>> >> > The problematic scenario I'm anticipating (and may currently be
>>>> >> > burning
>>>> >> > us)
>>>> >> > is:
>>>> >> > a) A (renderer) process allocates a lot of memory, and achieves a
>>>> >> > significant high water mark of memory used.
>>>> >> > b) The process deallocates a lot of memory, and it flows into the
>>>> >> > TCMalloc
>>>> >> > free list. [We still have a lot of memory attributed to that
>>>> >> > process,
>>>> >> > and
>>>> >> > the app as a whole shows as using that memory.]
>>>> >> > c) We eventually decide to decommit a lot of our free memory.
>>>> >> >  Currently
>>>> >> > this happens when we switch away from a tab. [This saves us from
>>>> >> > further
>>>> >> > swapping out the unused memory].
>>>> >> > Now comes the evil problem.
>>>> >> > d) We return to the tab which has a giant free list of spans, most
>>>> >> > of
>>>> >> > which
>>>> >> > are decommitted.  [The good news is that the memory is still
>>>> >> >  decommitted]
>>>> >> > e) We allocate  a block of memory, such as 32k chunk.  This memory
>>>> >> > is
>>>> >> > pulled
>>>> >> > from a decommitted span, and ONLY the allocated chunk is committed.
>>>> >> > [That
>>>> >> > sounds good]
>>>> >> > f) We free the block of memory from (e).  What ever span is adjacent
>>>> >> > to
>>>> >> > that
>>>> >> > block is committed <potential oops>.  Hence, if we he took (e) from
>>>> >> > a
>>>> >> > 200Meg
>>>> >> > span, the act of freeing (e) will cause a 200Meg commitment!?!  This
>>>> >> > in
>>>> >> > turn
>>>> >> > would not only require touching (and having VirtualAlloc clear to
>>>> >> > zero)
>>>> >> > all
>>>> >> > allocated memory in the large span, it will also immediately put
>>>> >> > memory
>>>> >> > pressure on the OS, and force as much as 200Megs of other apps to be
>>>> >> > swapped
>>>> >> > out to disk :-(.
>>>> >>
>>>> >> I'm not sure about swapping unless you touch those now committed
>>>> >> pages, but only experiment will tell.
>>>> >>
>>>> >> > I'm wary that our recent fix that allows spans to be (correctly)
>>>> >> > coalesced
>>>> >> > independent of their size should cause it to be easier to coalesce
>>>> >> > spans.
>>>> >> >  Worse yet, as we proceed to further optimize TCMalloc, one measure
>>>> >> > of
>>>> >> > success will be that the list of spans will be fragmented less and
>>>> >> > less,
>>>> >> > and
>>>> >> > we'll have larger and larger coalesced singular spans.  Any large
>>>> >> > "reserved"
>>>> >> > but not "commited" span will be a jank time-bomb waiting to blow up
>>>> >> > if
>>>> >> > the
>>>> >> > process every allocates/frees from such a large span :-(.
>>>> >> >
>>>> >> > WHAT IS THE PLAN GOING FORWARD (or how can we do better, and regain
>>>> >> > performance, etc.)
>>>> >> > We have at least the following plausible alternative ways to move
>>>> >> > forward
>>>> >> > with TCMalloc.  The overall goal is to avoid wasteful decommits, and
>>>> >> > at
>>>> >> > the
>>>> >> > same time avoid heap-wide flailing between minimal and maximal span
>>>> >> > commitment states.
>>>> >> > Each free-span is currently the maximal contiguous region of memory
>>>> >> > that
>>>> >> > TCMalloc is controlling, but has been deallocated.  Currently spans
>>>> >> > have
>>>> >> > to
>>>> >> > be totally committed, or totally decommitted.  There is no mixture
>>>> >> > supported.
>>>> >> > a) We could re-architect the span handling to allow spans to be
>>>> >> > combinations
>>>> >> > of committed and decommitted regions.
>>>> >> > b) We could vary out policy on what to do with a coalesced span,
>>>> >> > based
>>>> >> > on
>>>> >> > span size and memory pressure.  For example: We can consistently
>>>> >> > monitor
>>>> >> > the
>>>> >> > in-use vs free (but committed) ratio.  We can try to stay in some
>>>> >> > "acceptable" region by varying our policy.
>>>> >> > c) We could actually return to the OS some portions of spans that we
>>>> >> > have
>>>> >> > decommitted.  We could then let the OS give us back these regions if
>>>> >> > we
>>>> >> > need
>>>> >> > memory.  Until we get them back, we would not be at risk of doing
>>>> >> > unnecessary commits.  Decisions about when to return to the OS can
>>>> >> > be
>>>> >> > made
>>>> >> > based on span size and memory pressure.
>>>> >> > d) We can change the interval and forcing function for decommitting
>>>> >> > spans
>>>> >> > that are in our free list.
>>>> >> > In each of the above cases, we need benchmark data on user-class
>>>> >> > machines to
>>>> >> > show costs of these changes.  Until we understand the memory impact,
>>>> >> > we
>>>> >> > need
>>>> >> > to move forward conservatively in our action, and be vigilant for
>>>> >> > thrashing
>>>> >> > scenarios.
>>>> >> >
>>>> >> > Comments??
>>>> >>
>>>> >> As a close attempt you may have a look at
>>>> >> http://codereview.chromium.org/256013/show
>>>> >>
>>>> >> That allows spans with a mix of committed/decommitted pages (but only
>>>> >> in returned list) as committing seems to live fine if some pages are
>>>> >> already committed.
>>>> >>
>>>> >> That has some minor performance benefit, but I didn't investigate it
>>>> >> in details yet.
>>>> >>
>>>> >> just my 2 cents,
>>>> >> anton.
>>>> >
>>>> >
>>>
>>
>>
>

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: [email protected] 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

Reply via email to