[PATCH 1/2] perf bench futex: cache align the worer struct

Sebastian Andrzej Siewior Sun, 16 Oct 2016 12:09:32 -0700

It popped up in perf testing that the worker consumes some amount of
CPU. It boils down to the increment of `ops` which causes cache line
bouncing between the individual threads.
The patch aligns the struct by 256 bytes to ensure that not a cache line
is shared among CPUs. 128 byte is the x86 worst case and grep says that
L1_CACHE_SHIFT is set to 8 on s390.


Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
 tools/perf/bench/futex-hash.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
index 8024cd5febd2..d9e5e80bb4d0 100644
--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -39,12 +39,15 @@ static unsigned int threads_starting;
 static struct stats throughput_stats;
 static pthread_cond_t thread_parent, thread_worker;
 
+#define SMP_CACHE_BYTES 256
+#define __cacheline_aligned __attribute__ ((aligned (SMP_CACHE_BYTES)))
+
 struct worker {
        int tid;
        u_int32_t *futex;
        pthread_t thread;
        unsigned long ops;
-};
+} __cacheline_aligned;
 
 static const struct option options[] = {
        OPT_UINTEGER('t', "threads", &nthreads, "Specify amount of threads"),
-- 
2.9.3

[PATCH 1/2] perf bench futex: cache align the worer struct

Reply via email to