On 2016-10-19 10:59:33 [-0700], Davidlohr Bueso wrote: > Sebastian noted that overhead for worker thread ops (throughput) > accounting was producing 'perf' to appear in the profiles, consuming > a non-trivial (ie 13%) amount of CPU. This is due to cacheline > bouncing due to the increment of w->ops. We can easily fix this by > just working on a local copy and updating the actual worker once > done running, and ready to show the program summary. There is no > danger of the worker being concurrent, so we can trust that no stale > value is being seen by another thread. > > Reported-by: Sebastian Andrzej Siewior <bige...@linutronix.de> Acked-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
> --- a/tools/perf/bench/futex-hash.c > +++ b/tools/perf/bench/futex-hash.c > @@ -63,8 +63,9 @@ static const char * const bench_futex_hash_usage[] = { > static void *workerfn(void *arg) > { > int ret; > - unsigned int i; > struct worker *w = (struct worker *) arg; > + unsigned int i; > + unsigned long ops = w->ops; /* avoid cacheline bouncing */ we start at 0 so there is probably no need to init it with w->ops. Sebastian