On x86_64 systems, this test inmate measures the time that is required to read a value from main memory. Via rdtsc, it measures the CPU cycles that are required for the access. Acces can either happen cached, or uncached. In case of uncached access, the cache line will be flushed before access.
This tool repeats the measurement for 10e6 times, and outputs the average cycles that were required for the access. Before accessing the actual measurement, a dummy test is used to determine the average overhead of one single measurement. And that's pretty useful, because this tool gives a lot of insights of differences between the root and the non-root cell: With tiny effort, we can also run it on Linux. If the 'overhead' time differs between root and non-root cell, this can be an indicator that there might be some timing or speed differences between the root and non-root cell. If the 'uncached' or 'cached' average time differs between the non-root and root cell, it's an indicator that both might have different hardware configurations / setups. The host tool can be compiled with: $ gcc -Os -Wall -Wextra -fno-stack-protector -mno-red-zone -o cache-timing ./inmates/tests/x86/cache-timings-host.c Signed-off-by: Ralf Ramsauer <[email protected]> --- Hi Jan, what do you think about a test inmate like this one? It's still a RFC patch, as I'm not sure if the measurement setup is correct. Especially I might have too much fences. This test could be extended to run permanently and show the results of the last 1e3, 1e5 and 1e6 runs. Having this, this tool could be used to monitor influences of the root cell on the non-root cell's caches. Aaand btw: On a Xeon Gold 5118, we have following values on Linux resp. in the non-root cell: Linux: $ ./cache-timing Measurement rounds: 10000000 Determining measurement overhead... -> Average measurement overhead: 37 cycles Measuring uncached memory access... -> Average uncached memory access: 222 cycles Measuring cached memory access... -> Average cached memory access: 9 cycles Non-Root: Cell "apic-demo" can be loaded Started cell "apic-demo" CPU 3 received SIPI, vector 100 Measurement rounds: 10000000 Determining measurement overhead... -> Average measurement overhead: 82 cycles Measuring uncached memory access... -> Average uncached memory access: 247 cycles Measuring cached memory access... -> Average cached memory access: 19 cycles Cached Access on Linux is 2x faster than in the non-root cell - if my test is correct. This can - probably - explained by different cache configurations. Uncached access happens at almost the same speed. But do you have an explanation why the overhead measurement is more then 2x faster on Linux than in the non-root cell? Thanks Ralf inmates/tests/x86/Makefile | 3 +- inmates/tests/x86/cache-timings-common.c | 95 ++++++++++++++++++++++++ inmates/tests/x86/cache-timings-host.c | 27 +++++++ inmates/tests/x86/cache-timings.c | 15 ++++ 4 files changed, 139 insertions(+), 1 deletion(-) create mode 100644 inmates/tests/x86/cache-timings-common.c create mode 100644 inmates/tests/x86/cache-timings-host.c create mode 100644 inmates/tests/x86/cache-timings.c diff --git a/inmates/tests/x86/Makefile b/inmates/tests/x86/Makefile index 6c8dc0e7..6e529dde 100644 --- a/inmates/tests/x86/Makefile +++ b/inmates/tests/x86/Makefile @@ -12,8 +12,9 @@ include $(INMATES_LIB)/Makefile.lib -INMATES := mmio-access.bin mmio-access-32.bin sse-demo.bin sse-demo-32.bin +INMATES := cache-timings.bin mmio-access.bin mmio-access-32.bin sse-demo.bin sse-demo-32.bin +cache-timings-y := cache-timings.o mmio-access-y := mmio-access.o $(eval $(call DECLARE_32_BIT,mmio-access-32)) diff --git a/inmates/tests/x86/cache-timings-common.c b/inmates/tests/x86/cache-timings-common.c new file mode 100644 index 00000000..0edf65e6 --- /dev/null +++ b/inmates/tests/x86/cache-timings-common.c @@ -0,0 +1,95 @@ +/* + * Jailhouse, a Linux-based partitioning hypervisor + * + * Copyright (c) OTH Regensburg, 2020 + * + * Authors: + * Ralf Ramsauer <[email protected]> + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#define ROUNDS (10 * 1000 * 1000) + +union tscval { + struct { + u32 lo; + u32 hi; + } __attribute__((packed)); + u64 val; +} __attribute__((packed)); + +static u32 victim; + +static inline void clflush(void *addr) +{ + asm volatile("clflush %0\t\n" + "mfence\t\n" + "lfence\t\n" : "+m" (*(volatile char *)addr)); +} + +#define MEASUREMENT_OVERHEAD "nop\t\n" +#define MEASUREMENT_COMMAND "mov (%%rbx), %%ebx\t\n" +#define DECLARE_MEASUREMENT(name, flush, meas) \ + static inline u64 measure_##name(u32 *victim) \ + { \ + union tscval before, after; \ + \ + if (flush) \ + clflush(victim); \ + asm volatile("mov %4, %%rbx\t\n" \ + "lfence\t\n" \ + "rdtsc\t\n" \ + "lfence\t\n" \ + \ + meas \ + \ + "mov %%eax, %%ebx\t\n" \ + "mov %%edx, %%ecx\t\n" \ + "lfence\t\n" \ + "rdtsc\t\n" \ + "lfence\t\n" \ + "mov %%ebx, %0\t\n" \ + "mov %%ecx, %1\t\n" \ + "mov %%eax, %2\t\n" \ + "mov %%edx, %3\t\n" \ + : "=m"(before.lo), "=m" (before.hi), \ + "=m" (after.lo), "=m" (after.hi) \ + : "m" (victim) \ + : "eax", "rbx", "ecx", "edx"); \ + return after.val - before.val; \ + } + +DECLARE_MEASUREMENT(overhead, false, MEASUREMENT_OVERHEAD) +DECLARE_MEASUREMENT(cached, false, MEASUREMENT_COMMAND) +DECLARE_MEASUREMENT(uncached, true, MEASUREMENT_COMMAND) + +static inline u64 avg_measurement(u64 (*meas)(u32*), u32 *victim, + unsigned int rounds, u64 overhead) +{ + u64 cycles = 0; + unsigned int i; + + for (i = 0; i < rounds; i++) + cycles += meas(victim) - overhead; + return cycles / rounds; +} + +void inmate_main(void) +{ + u64 cycles, overhead; + + printk("Measurement rounds: %u\n", ROUNDS); + printk("Determining measurement overhead...\n"); + overhead = avg_measurement(measure_overhead, &victim, ROUNDS, 0); + printk(" -> Average measurement overhead: %llu cycles\n", overhead); + + printk("Measuring uncached memory access...\n"); + cycles = avg_measurement(measure_uncached, &victim, ROUNDS, overhead); + printk(" -> Average uncached memory access: %llu cycles\n", cycles); + + printk("Measuring cached memory access...\n"); + cycles = avg_measurement(measure_cached, &victim, ROUNDS, overhead); + printk(" -> Average cached memory access: %llu cycles\n", cycles); +} diff --git a/inmates/tests/x86/cache-timings-host.c b/inmates/tests/x86/cache-timings-host.c new file mode 100644 index 00000000..229db904 --- /dev/null +++ b/inmates/tests/x86/cache-timings-host.c @@ -0,0 +1,27 @@ +/* + * Jailhouse, a Linux-based partitioning hypervisor + * + * Copyright (c) OTH Regensburg, 2020 + * + * Authors: + * Ralf Ramsauer <[email protected]> + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#include <stdbool.h> +#include <stdio.h> + +#define printk printf + +typedef unsigned int u32; +typedef unsigned long long u64; + +#include "cache-timings-common.c" + +int main(void) +{ + inmate_main(); + return 0; +} diff --git a/inmates/tests/x86/cache-timings.c b/inmates/tests/x86/cache-timings.c new file mode 100644 index 00000000..1acc3ee9 --- /dev/null +++ b/inmates/tests/x86/cache-timings.c @@ -0,0 +1,15 @@ +/* + * Jailhouse, a Linux-based partitioning hypervisor + * + * Copyright (c) OTH Regensburg, 2020 + * + * Authors: + * Ralf Ramsauer <[email protected]> + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#include <inmate.h> + +#include "cache-timings-common.c" -- 2.28.0 -- You received this message because you are subscribed to the Google Groups "Jailhouse" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jailhouse-dev/20201002235640.196730-1-ralf.ramsauer%40oth-regensburg.de.
