On Jan 15, 2014, at 1:09 AM, Evgeniy Ivanov <[email protected]> wrote: > On Tue, Jan 14, 2014 at 10:22 PM, Jason Evans <[email protected]> wrote: >> On Dec 22, 2013, at 11:41 PM, Evgeniy Ivanov <[email protected]> wrote: >>> I need to profile my application running in production. Is it >>> performance safe to build jemalloc with "--enable-prof", start >>> application with profiling disabled and enable it for short time >>> (probably via mallctl() call), when I need? I'm mostly interested in >>> stacks, i.e. opt.prof_accum. Or are there better alternatives in >>> Linux? I've tried perf, but it just counts stacks and doesn't care >>> about amount of memory allocated. There is also stap, but I haven't >>> try it yet. >> >> Yes, you can use jemalloc's heap profiling as you describe, with essentially >> no performance impact while heap profiling is inactive. You may even be >> able to leave heap profiling active all the time with little performance >> impact, depending on how heavily your application uses malloc. At Facebook >> we leave heap profiling active all the time for a wide variety of server >> applications; there are only a couple of exceptions I'm aware of for which >> the performance impact is unacceptable (heavy malloc use, ~2% slowdown when >> heap profiling is active). > > What settings had you been using and what had been measured, when you > got 2% slowdown?
My vague recollection is that the app was heavily multi-threaded, and spent about 10% of its total time in malloc. Therefore a 2% overall slowdown corresponded to a ~20% slowdown in jemalloc itself. Note that size class distribution matters to heap profiling performance because there are two sources of overhead (counter maintenance and backtracing), but I don’t remember what the distribution looked like. We were using a version of libunwind that had a backtrace caching mechanism built in (it was never accepted upstream, and libunwind’s current caching mechanism cannot safely be used by malloc). > In our test (latency related) I got following > results: > normal jemalloc: %99 <= 87 usec (Avg: 65 usec) > inactive profiling: %99 <= 88 usec (Avg: 66 usec) > > MALLOC_CONF="prof:true,prof_active:true,lg_prof_sample:19,prof_accum:true,prof_prefix:jeprof.out” We usually use prof_accum:false, mainly because complicated call graphs can cause a huge number of retained backtraces, but otherwise your settings match. > prof-libgcc: %99 <= 125 usec (Avg: 70 usec) > prof-libunwind: %99 <= 146 usec (Avg: 76 usec) > > So in average slowdown is 6% for libgcc and 15% for libunwind. But for > distribution (99% < X) slowdown is 42% or 65% depending on library, > which is huge difference. For 64 Kb numbers are dramatic: 154% (99% < > X) performance lose. > > Do I miss something in configuration? If your application is spending ~10-30% of its time in malloc, then your numbers sound reasonable. You may find that a lower sampling rate (e.g. lg_prof_sample:21) drops backtracing overhead enough that performance is acceptable. I’ve experimented in the past with lower sampling rates, and for long-running applications I’ve found that the heap profiles are still totally usable, because total allocation volume is high. Jason _______________________________________________ jemalloc-discuss mailing list [email protected] http://www.canonware.com/mailman/listinfo/jemalloc-discuss
