On Thu, Apr 28, 2011 at 4:58 PM, Xinliang David Li <davi...@google.com> wrote:
>
> + Honza
>
> This patch may be a candidate for trunk as well. This feature not only
> allows profile collection with much less overhead (for multi-thread
> programs with hot regions, the slow down can be significant due to
> cache ping-pong effect of counter update) without sacrificing too much
> the performance.
>

At an extreme I saw overhead reduction from 30x to 2x on actual server
applications, but 10x to 2x was more common.  10x may not be an issue
for some workloads.  There's "train" input for SPEC.  But when you
have a server that needs to warm up 3 hours before the function
profile becomes relevant, that 10x to 2x makes the qualitative
difference.

I'm stating the obvious, but, for the record, note that turning this
on for single threaded applications would actually lead to overhead
(about 30%), as the sampling code is more expensive than the counter
update on a single core.  That's why it's not turned on by default.

>
> Another usage for this support is that it allows profile collection to
> be turned on/off asynchronously for long running server programs which
> sometimes profile data in warm up period is not important and should
> be excluded.
>

For completeness, I tried at some point to add two wrappers, the first
as an on/off switch and the second this proposed sampling wrapper.
But code size almost doubled and overhead went up significantly, so I
ditched the on/off switch.  The workaround is to start with a very
large sampling rate and then make a call into libgcov to reset the
rate at runtime, just when you're ready to measure.

> A known limitation is that value profiling is not yet sampled, but it
> does not seem to cause problems.
>
> David

Thank you, Easwaran and David, for bringing this upstream.  Mea culpa.
Silvius

Reply via email to