Re: Profiling memory allocations in run-time in production

Jason Evans Wed, 15 Jan 2014 08:39:28 -0800

On Jan 15, 2014, at 1:09 AM, Evgeniy Ivanov <[email protected]> wrote:
> On Tue, Jan 14, 2014 at 10:22 PM, Jason Evans <[email protected]> wrote:
>> On Dec 22, 2013, at 11:41 PM, Evgeniy Ivanov <[email protected]> wrote:
>>> I need to profile my application running in production. Is it
>>> performance safe to build jemalloc with "--enable-prof", start
>>> application with profiling disabled and enable it for short time
>>> (probably via mallctl() call), when I need? I'm mostly interested in
>>> stacks, i.e. opt.prof_accum. Or are there better alternatives in
>>> Linux? I've tried perf, but it just counts stacks and doesn't care
>>> about amount of memory allocated. There is also stap, but I haven't
>>> try it yet.
>> 
>> Yes, you can use jemalloc's heap profiling as you describe, with essentially 
>> no performance impact while heap profiling is inactive.  You may even be 
>> able to leave heap profiling active all the time with little performance 
>> impact, depending on how heavily your application uses malloc.  At Facebook 
>> we leave heap profiling active all the time for a wide variety of server 
>> applications; there are only a couple of exceptions I'm aware of for which 
>> the performance impact is unacceptable (heavy malloc use, ~2% slowdown when 
>> heap profiling is active).
> 
> What settings had you been using and what had been measured, when you
> got 2% slowdown?


My vague recollection is that the app was heavily multi-threaded, and spent 
about 10% of its total time in malloc.  Therefore a 2% overall slowdown 
corresponded to a ~20% slowdown in jemalloc itself.  Note that size class 
distribution matters to heap profiling performance because there are two 
sources of overhead (counter maintenance and backtracing), but I don’t remember 
what the distribution looked like.  We were using a version of libunwind that 
had a backtrace caching mechanism built in (it was never accepted upstream, and 
libunwind’s current caching mechanism cannot safely be used by malloc).

> In our test (latency related) I got following
> results:
> normal jemalloc: %99 <= 87 usec (Avg: 65 usec)
> inactive profiling: %99 <= 88 usec (Avg: 66 usec)
> 
> MALLOC_CONF="prof:true,prof_active:true,lg_prof_sample:19,prof_accum:true,prof_prefix:jeprof.out”

We usually use prof_accum:false, mainly because complicated call graphs can 
cause a huge number of retained backtraces, but otherwise your settings match.

> prof-libgcc: %99 <= 125 usec (Avg: 70 usec)
> prof-libunwind: %99 <= 146 usec (Avg: 76 usec)
> 
> So in average slowdown is 6% for libgcc and 15% for libunwind. But for
> distribution (99% < X) slowdown is 42% or 65% depending on library,
> which is huge difference. For 64 Kb numbers are dramatic: 154% (99% <
> X) performance lose.
> 
> Do I miss something in configuration?

If your application is spending ~10-30% of its time in malloc, then your 
numbers sound reasonable.  You may find that a lower sampling rate (e.g. 
lg_prof_sample:21) drops backtracing overhead enough that performance is 
acceptable.  I’ve experimented in the past with lower sampling rates, and for 
long-running applications I’ve found that the heap profiles are still totally 
usable, because total allocation volume is high.

Jason
_______________________________________________
jemalloc-discuss mailing list
[email protected]
http://www.canonware.com/mailman/listinfo/jemalloc-discuss

Re: Profiling memory allocations in run-time in production

Reply via email to