Re: Profiling memory allocations in run-time in production

Evgeniy Ivanov Thu, 16 Jan 2014 06:11:29 -0800

On Wed, Jan 15, 2014 at 8:39 PM, Jason Evans <[email protected]> wrote:
> On Jan 15, 2014, at 1:09 AM, Evgeniy Ivanov <[email protected]> wrote:
>> On Tue, Jan 14, 2014 at 10:22 PM, Jason Evans <[email protected]> wrote:
>>> On Dec 22, 2013, at 11:41 PM, Evgeniy Ivanov <[email protected]> wrote:
>>>> I need to profile my application running in production. Is it
>>>> performance safe to build jemalloc with "--enable-prof", start
>>>> application with profiling disabled and enable it for short time
>>>> (probably via mallctl() call), when I need? I'm mostly interested in
>>>> stacks, i.e. opt.prof_accum. Or are there better alternatives in
>>>> Linux? I've tried perf, but it just counts stacks and doesn't care
>>>> about amount of memory allocated. There is also stap, but I haven't
>>>> try it yet.
>>>
>>> Yes, you can use jemalloc's heap profiling as you describe, with 
>>> essentially no performance impact while heap profiling is inactive.  You 
>>> may even be able to leave heap profiling active all the time with little 
>>> performance impact, depending on how heavily your application uses malloc.  
>>> At Facebook we leave heap profiling active all the time for a wide variety 
>>> of server applications; there are only a couple of exceptions I'm aware of 
>>> for which the performance impact is unacceptable (heavy malloc use, ~2% 
>>> slowdown when heap profiling is active).
>>
>> What settings had you been using and what had been measured, when you
>> got 2% slowdown?
>
> My vague recollection is that the app was heavily multi-threaded, and spent 
> about 10% of its total time in malloc.  Therefore a 2% overall slowdown 
> corresponded to a ~20% slowdown in jemalloc itself.  Note that size class 
> distribution matters to heap profiling performance because there are two 
> sources of overhead (counter maintenance and backtracing), but I don’t 
> remember what the distribution looked like.  We were using a version of 
> libunwind that had a backtrace caching mechanism built in (it was never 
> accepted upstream, and libunwind’s current caching mechanism cannot safely be 
> used by malloc).
>
>> In our test (latency related) I got following
>> results:
>> normal jemalloc: %99 <= 87 usec (Avg: 65 usec)
>> inactive profiling: %99 <= 88 usec (Avg: 66 usec)
>>
>> MALLOC_CONF="prof:true,prof_active:true,lg_prof_sample:19,prof_accum:true,prof_prefix:jeprof.out”
>
> We usually use prof_accum:false, mainly because complicated call graphs can 
> cause a huge number of retained backtraces, but otherwise your settings match.


Stacks is our primary point of interest. Using DTrace we trace each
malloc, but skip the ones, which request less than 16 Kb. DTrace
overhead on Solaris is just 13%. Not sure if allocation statistics
might be useful for us.

>> prof-libgcc: %99 <= 125 usec (Avg: 70 usec)
>> prof-libunwind: %99 <= 146 usec (Avg: 76 usec)
>>
>> So in average slowdown is 6% for libgcc and 15% for libunwind. But for
>> distribution (99% < X) slowdown is 42% or 65% depending on library,
>> which is huge difference. For 64 Kb numbers are dramatic: 154% (99% <
>> X) performance lose.
>>
>> Do I miss something in configuration?
>
> If your application is spending ~10-30% of its time in malloc, then your 
> numbers sound reasonable.  You may find that a lower sampling rate (e.g. 
> lg_prof_sample:21) drops backtracing overhead enough that performance is 
> acceptable.  I’ve experimented in the past with lower sampling rates, and for 
> long-running applications I’ve found that the heap profiles are still totally 
> usable, because total allocation volume is high.

Tested on one of our workloads. For lg_prof_sample:20 we get 18%
slowdown, and for lg_prof_sample:21 it is just 4.5%, which is
absolutely acceptable.


Jason, thanks a lot for your answers! jemalloc is really awesome and
powerful thing!


-- 
Cheers,
Evgeniy
_______________________________________________
jemalloc-discuss mailing list
[email protected]
http://www.canonware.com/mailman/listinfo/jemalloc-discuss

Re: Profiling memory allocations in run-time in production

Reply via email to