[RFC PATCH] inmates: x86: add cache access time test

Ralf Ramsauer Fri, 02 Oct 2020 16:57:47 -0700

On x86_64 systems, this test inmate measures the time that is required
to read a value from main memory. Via rdtsc, it measures the CPU cycles
that are required for the access. Acces can either happen cached, or
uncached. In case of uncached access, the cache line will be flushed
before access.


This tool repeats the measurement for 10e6 times, and outputs the
average cycles that were required for the access. Before accessing the
actual measurement, a dummy test is used to determine the average
overhead of one single measurement.

And that's pretty useful, because this tool gives a lot of insights of
differences between the root and the non-root cell: With tiny effort, we
can also run it on Linux.

If the 'overhead' time differs between root and non-root cell, this can
be an indicator that there might be some timing or speed differences
between the root and non-root cell.

If the 'uncached' or 'cached' average time differs between the non-root
and root cell, it's an indicator that both might have different hardware
configurations / setups.

The host tool can be compiled with:
$ gcc -Os -Wall -Wextra -fno-stack-protector -mno-red-zone -o cache-timing 
./inmates/tests/x86/cache-timings-host.c

Signed-off-by: Ralf Ramsauer <[email protected]>
---

Hi Jan,

what do you think about a test inmate like this one? It's still a RFC patch, as
I'm not sure if the measurement setup is correct. Especially I might have too
much fences.

This test could be extended to run permanently and show the results of the last
1e3, 1e5 and 1e6 runs. Having this, this tool could be used to monitor
influences of the root cell on the non-root cell's caches.


Aaand btw: On a Xeon Gold 5118, we have following values on Linux resp. in the
non-root cell:

Linux:
$ ./cache-timing
Measurement rounds: 10000000
Determining measurement overhead...
  -> Average measurement overhead: 37 cycles
Measuring uncached memory access...
  -> Average uncached memory access: 222 cycles
Measuring cached memory access...
  -> Average cached memory access: 9 cycles

Non-Root:
Cell "apic-demo" can be loaded
Started cell "apic-demo"
CPU 3 received SIPI, vector 100
Measurement rounds: 10000000
Determining measurement overhead...
  -> Average measurement overhead: 82 cycles
Measuring uncached memory access...
  -> Average uncached memory access: 247 cycles
Measuring cached memory access...
  -> Average cached memory access: 19 cycles

Cached Access on Linux is 2x faster than in the non-root cell - if my test is
correct. This can - probably - explained by different cache configurations.
Uncached access happens at almost the same speed.

But do you have an explanation why the overhead measurement is more then 2x
faster on Linux than in the non-root cell?

Thanks
  Ralf

 inmates/tests/x86/Makefile               |  3 +-
 inmates/tests/x86/cache-timings-common.c | 95 ++++++++++++++++++++++++
 inmates/tests/x86/cache-timings-host.c   | 27 +++++++
 inmates/tests/x86/cache-timings.c        | 15 ++++
 4 files changed, 139 insertions(+), 1 deletion(-)
 create mode 100644 inmates/tests/x86/cache-timings-common.c
 create mode 100644 inmates/tests/x86/cache-timings-host.c
 create mode 100644 inmates/tests/x86/cache-timings.c

diff --git a/inmates/tests/x86/Makefile b/inmates/tests/x86/Makefile
index 6c8dc0e7..6e529dde 100644
--- a/inmates/tests/x86/Makefile
+++ b/inmates/tests/x86/Makefile
@@ -12,8 +12,9 @@
 
 include $(INMATES_LIB)/Makefile.lib
 
-INMATES := mmio-access.bin mmio-access-32.bin sse-demo.bin sse-demo-32.bin
+INMATES := cache-timings.bin mmio-access.bin mmio-access-32.bin sse-demo.bin 
sse-demo-32.bin
 
+cache-timings-y := cache-timings.o
 mmio-access-y := mmio-access.o
 
 $(eval $(call DECLARE_32_BIT,mmio-access-32))
diff --git a/inmates/tests/x86/cache-timings-common.c 
b/inmates/tests/x86/cache-timings-common.c
new file mode 100644
index 00000000..0edf65e6
--- /dev/null
+++ b/inmates/tests/x86/cache-timings-common.c
@@ -0,0 +1,95 @@
+/*
+ * Jailhouse, a Linux-based partitioning hypervisor
+ *
+ * Copyright (c) OTH Regensburg, 2020
+ *
+ * Authors:
+ *  Ralf Ramsauer <[email protected]>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#define ROUNDS (10 * 1000 * 1000)
+
+union tscval {
+       struct {
+               u32 lo;
+               u32 hi;
+       } __attribute__((packed));
+       u64 val;
+} __attribute__((packed));
+
+static u32 victim;
+
+static inline void clflush(void *addr)
+{
+       asm volatile("clflush %0\t\n"
+                    "mfence\t\n"
+                    "lfence\t\n" : "+m" (*(volatile char *)addr));
+}
+
+#define MEASUREMENT_OVERHEAD   "nop\t\n"
+#define MEASUREMENT_COMMAND    "mov (%%rbx), %%ebx\t\n"
+#define DECLARE_MEASUREMENT(name, flush, meas) \
+       static inline u64 measure_##name(u32 *victim)                   \
+       {                                                               \
+               union tscval before, after;                             \
+                                                                       \
+               if (flush)                                              \
+                       clflush(victim);                                \
+               asm volatile("mov %4, %%rbx\t\n"                        \
+                            "lfence\t\n"                               \
+                            "rdtsc\t\n"                                \
+                            "lfence\t\n"                               \
+                                                                       \
+                            meas                                       \
+                                                                       \
+                            "mov %%eax, %%ebx\t\n"                     \
+                            "mov %%edx, %%ecx\t\n"                     \
+                            "lfence\t\n"                               \
+                            "rdtsc\t\n"                                \
+                            "lfence\t\n"                               \
+                            "mov %%ebx, %0\t\n"                        \
+                            "mov %%ecx, %1\t\n"                        \
+                            "mov %%eax, %2\t\n"                        \
+                            "mov %%edx, %3\t\n"                        \
+                            : "=m"(before.lo), "=m" (before.hi),       \
+                              "=m" (after.lo), "=m" (after.hi)         \
+                            : "m" (victim)                             \
+                            : "eax", "rbx", "ecx", "edx");             \
+               return after.val - before.val;                          \
+       }
+
+DECLARE_MEASUREMENT(overhead, false, MEASUREMENT_OVERHEAD)
+DECLARE_MEASUREMENT(cached, false, MEASUREMENT_COMMAND)
+DECLARE_MEASUREMENT(uncached, true, MEASUREMENT_COMMAND)
+
+static inline u64 avg_measurement(u64 (*meas)(u32*), u32 *victim,
+                                 unsigned int rounds, u64 overhead)
+{
+       u64 cycles = 0;
+       unsigned int i;
+
+       for (i = 0; i < rounds; i++)
+               cycles += meas(victim) - overhead;
+       return cycles / rounds;
+}
+
+void inmate_main(void)
+{
+       u64 cycles, overhead;
+
+       printk("Measurement rounds: %u\n", ROUNDS);
+       printk("Determining measurement overhead...\n");
+       overhead = avg_measurement(measure_overhead, &victim, ROUNDS, 0);
+       printk("  -> Average measurement overhead: %llu cycles\n", overhead);
+
+       printk("Measuring uncached memory access...\n");
+       cycles = avg_measurement(measure_uncached, &victim, ROUNDS, overhead);
+       printk("  -> Average uncached memory access: %llu cycles\n", cycles);
+
+       printk("Measuring cached memory access...\n");
+       cycles = avg_measurement(measure_cached, &victim, ROUNDS, overhead);
+       printk("  -> Average cached memory access: %llu cycles\n", cycles);
+}
diff --git a/inmates/tests/x86/cache-timings-host.c 
b/inmates/tests/x86/cache-timings-host.c
new file mode 100644
index 00000000..229db904
--- /dev/null
+++ b/inmates/tests/x86/cache-timings-host.c
@@ -0,0 +1,27 @@
+/*
+ * Jailhouse, a Linux-based partitioning hypervisor
+ *
+ * Copyright (c) OTH Regensburg, 2020
+ *
+ * Authors:
+ *  Ralf Ramsauer <[email protected]>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <stdbool.h>
+#include <stdio.h>
+
+#define printk printf
+
+typedef unsigned int u32;
+typedef unsigned long long u64;
+
+#include "cache-timings-common.c"
+
+int main(void)
+{
+       inmate_main();
+       return 0;
+}
diff --git a/inmates/tests/x86/cache-timings.c 
b/inmates/tests/x86/cache-timings.c
new file mode 100644
index 00000000..1acc3ee9
--- /dev/null
+++ b/inmates/tests/x86/cache-timings.c
@@ -0,0 +1,15 @@
+/*
+ * Jailhouse, a Linux-based partitioning hypervisor
+ *
+ * Copyright (c) OTH Regensburg, 2020
+ *
+ * Authors:
+ *  Ralf Ramsauer <[email protected]>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <inmate.h>
+
+#include "cache-timings-common.c"
-- 
2.28.0

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jailhouse-dev/20201002235640.196730-1-ralf.ramsauer%40oth-regensburg.de.

[RFC PATCH] inmates: x86: add cache access time test

Reply via email to