Thanks for reminding. Now I switched to master + central and the result is 
similar (11MB ~ 13MB Rss saved right after GC or more before GC[1]). This is a 
small patch and all modifications reside in BionicGlue.cpp :-)

[1] It has similar effect to GC, because garbage will never be accessed until 
collected.

----- Original Message -----
From: "Nicolas B. Pierron" <[email protected]>
To: [email protected]
Sent: Wednesday, August 7, 2013 2:17:40 AM
Subject: Re: [b2g] Some observations on memory usages in main process

First of all, you should not work on top of b2g18 for doing benchmarks, 
unless your intent is to backport it and you have already done so on 
central?  This sounds like big changes and 1.2 would be released with gecko 
26, which is coming soon.

On 08/05/2013 02:37 AM, Ting-Yuan Huang wrote:
> Because SunSpider and Octane are always killed by OOM on unagi-b2g18-256M, 
> the tests are conducted on unagi-b2g18-512M. The peak resident set size 
> (VmHWM) of browser when running Kraken is only ~96MB so the results with 256M 
> and 512M should be very close.

Sunspider, killed by OOM ?  I call that a huge regression. Even when I 
checked with b2g18 back in April, I never noticed that sunspider was killed 
by any OOM on a unagi with 256 of RAM.

> The numbers are averaged over 3 samples. Each sample is sampled right after 
> rebooting.

You need to be careful, the phone is doing a lot of things when you are 
changing applications, and this is for this precise reasons that you need to 
let the system settle down before starting the benchmarks.

>             original  patched
> SunSpider  3503ms    3445ms
> Kraken     71554ms   71726ms
> Octane     231       228

Seriously [1], when will we stop using b2g18?

> and, although it is very subjective, I didn't notice any latency introduced 
> by this patch when juggling with the UI.
>
> Any suggestions?

Yes, run benchmarks on the latest gecko and give digits which can be 
compared with AWFY [1].  I all suggest you to have a look at the harness 
that I made for AWFY to run your benchmarks [2, 3].

[1] http://arewefastyet.com/#machine=14
[2] https://github.com/nbp/gaia-ui-tests/tree/bench
[3] https://github.com/nbp/arewefastyet/tree/master/driver

> ----- Original Message -----
> From: "Jonas Sicking" <[email protected]>
> To: "Ting-Yuan Huang" <[email protected]>
> Cc: [email protected], "Thinker Lee" <[email protected]>, "Justin 
> Lebar" <[email protected]>
> Sent: Sunday, August 4, 2013 2:00:19 PM
> Subject: Re: [b2g] Some observations on memory usages in main process
>
> What was the performance impact apart from the initial writing out of
> pages? I.e. did you check if any operations got slower due to memory
> reads that cause these pages to be read back in?
>
> / Jonas
>
> On Tue, Jul 30, 2013 at 2:55 AM, Ting-Yuan Huang <[email protected]> wrote:
>> A proof-of-concept implementation that tracks the calls to the mmap() family 
>> is in [2]. It waits for a signal to writes anonymous memory to files and 
>> maps them back so that the non-varying parts can be evicted and those rarely 
>> used data can be swapped out.
>>
>> In the test[1], it saved about 17MB out of 46MB anonymous memory on average 
>> without manually calling GC, or 10MB out of 38MB right after GC. The size of 
>> the varying parts (measured by summing Private_Dirty pages) is quite stable 
>> and is around 22MB. The non-varying working set (Private_Clean) is around 
>> 4MB ~ 7MB.
>>
>> In some extreme cases, such as running SunSpider 1.0, the Rss(committed 
>> physical memory) of browser went to ~130MB and ~100MB before killed with and 
>> without this patch, respectively. Namely, it gets extra 30MB memory at the 
>> expense of performance.
>>
>> The downside of this patch is that it cost about 5 sec to write out all 
>> anonymous memory that is tracked. This would incur a non-acceptable freeze 
>> of the device, including UI. A possible way to ease the lag is write out the 
>> memory incrementally; It will be implemented in the next WIP patch.
>>
>> The impacts to the performance other than the above are complicated and 
>> still under evaluation; Since the data are written out, the tests will 
>> include both cold-cache and warm-cache cases.
>>
>> [1] open twitter, facebook and look around.
>> [2] https://bugzilla.mozilla.org/show_bug.cgi?id=899493
>>
>> ----- Original Message -----
>> From: "Justin Lebar" <[email protected]>
>> To: "Ting-Yuan Huang" <[email protected]>
>> Cc: "Thinker Lee" <[email protected]>, [email protected]
>> Sent: Thursday, July 4, 2013 1:23:08 AM
>> Subject: Re: [b2g] Some observations on memory usages in main process
>>
>> It's interesting to know that about 30mb of memory is not written to often,
>> but that's only half the story.
>>
>> The other half is, how often are those pages read?  If they're read often,
>> there's little we can do from the OS level.
>>
>> My guess would be that there's plenty of memory that's not read very often,
>> but who knows!
>> On Jul 3, 2013 3:49 AM, "Ting-Yuan Huang" <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> We recently observed that
>>>
>>> 1.  About half of the anonymous memory in b2g do not vary very often.
>>>      We taked a snapshot after booting into lock-screen. After playing for
>>> a while[1], we taked the other
>>>      snapshot. The common parts of the two snapshot are around 30MB out of
>>> total 60MB anonymous memory.
>>>
>>> 2.  We tried to compress the anonymous memory in b2g and the compression
>>> ratio is 25% ~ 50%, depending on
>>>      the compression methods.
>>>
>>> It seems that we can take advantage of these characteristics. For 1, we
>>> might either
>>>
>>> a.  Write those anonymous memory into files and map them back after
>>> booting so that they can be used as
>>>      page caches and OS can evict them on memory pressure.
>>> b.  Intercept mmap() to redirect all mmap(..., MAP_ANONYMOUS, ...) into
>>> files. This is very similar to a.
>>> c.  Enable swap.
>>>
>>> They all rely on that Linux kernel evicts clean pages first and then dirty
>>> pages. We must tune
>>> some parameters to strike a balance among performance, memory usage and
>>> the life of flash.
>>>
>>> For 2, we might enable in-kernel memory compressions[2]. For example, zRam
>>> is an in-memory swap device that
>>> pages are swapped in/out before/after decompression/compression,
>>> respectively. There's no write to flash.
>>> We might in addition add some APIs to hint kernel. For example, when an
>>> application is lowered to
>>> background, it is expected to be less active and becomes a good candidate
>>> to be compressed and swapped out.
>>>
>>> What do you think?
>>>
>>> [1] Open facebook and twitter and look around, take a picture, run
>>> sunspider and so on.
>>> [2] http://lwn.net/Articles/545244/
>>> _______________________________________________
>>> dev-b2g mailing list
>>> [email protected]
>>> https://lists.mozilla.org/listinfo/dev-b2g
>>>
>> _______________________________________________
>> dev-b2g mailing list
>> [email protected]
>> https://lists.mozilla.org/listinfo/dev-b2g


-- 
Nicolas B. Pierron
_______________________________________________
dev-b2g mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-b2g
_______________________________________________
dev-b2g mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-b2g

Reply via email to