On 10/4/20 8:38 PM, Jan Kiszka wrote: > On 03.10.20 01:56, Ralf Ramsauer wrote: >> On x86_64 systems, this test inmate measures the time that is required >> to read a value from main memory. Via rdtsc, it measures the CPU cycles >> that are required for the access. Acces can either happen cached, or >> uncached. In case of uncached access, the cache line will be flushed >> before access. >> >> This tool repeats the measurement for 10e6 times, and outputs the >> average cycles that were required for the access. Before accessing the >> actual measurement, a dummy test is used to determine the average >> overhead of one single measurement. >> >> And that's pretty useful, because this tool gives a lot of insights of >> differences between the root and the non-root cell: With tiny effort, we >> can also run it on Linux. >> >> If the 'overhead' time differs between root and non-root cell, this can >> be an indicator that there might be some timing or speed differences >> between the root and non-root cell. >> >> If the 'uncached' or 'cached' average time differs between the non-root >> and root cell, it's an indicator that both might have different hardware >> configurations / setups. >> >> The host tool can be compiled with: >> $ gcc -Os -Wall -Wextra -fno-stack-protector -mno-red-zone -o cache-timing >> ./inmates/tests/x86/cache-timings-host.c >> >> Signed-off-by: Ralf Ramsauer <[email protected]> >> --- >> >> Hi Jan, >> >> what do you think about a test inmate like this one? It's still a RFC patch, >> as >> I'm not sure if the measurement setup is correct. Especially I might have too >> much fences. >> >> This test could be extended to run permanently and show the results of the >> last >> 1e3, 1e5 and 1e6 runs. Having this, this tool could be used to monitor >> influences of the root cell on the non-root cell's caches. > > Such benchmarks aren't bad. However, the current form does not qualify > for the test folder yet IMHO: no functional test, no easy evaluation of > benchmark results in order to generate a pass/fail criteria.
Ack, will move it to demos/. Before posting a v2: Did you have the chance to look at the usage of the fences? I'm pretty sure that I might have messed up something. > >> >> >> Aaand btw: On a Xeon Gold 5118, we have following values on Linux resp. in >> the >> non-root cell: >> >> Linux: >> $ ./cache-timing >> Measurement rounds: 10000000 >> Determining measurement overhead... >> -> Average measurement overhead: 37 cycles >> Measuring uncached memory access... >> -> Average uncached memory access: 222 cycles >> Measuring cached memory access... >> -> Average cached memory access: 9 cycles >> > > Linux native or Linux in Jailhouse? > >> Non-Root: >> Cell "apic-demo" can be loaded >> Started cell "apic-demo" >> CPU 3 received SIPI, vector 100 >> Measurement rounds: 10000000 >> Determining measurement overhead... >> -> Average measurement overhead: 82 cycles >> Measuring uncached memory access... >> -> Average uncached memory access: 247 cycles >> Measuring cached memory access... >> -> Average cached memory access: 19 cycles > > How does this compare to Linux in Jailhouse (if the above was native)? Ok, the following table shows the three numbers for overhead / uncached / cached: Measurement | OH | U$ | $ -----------------------+----+-----+----- Linux native | 37 | 222 | 9 Linux root | 37 | 226 | 9 Linux non-root | 37 | 215 | 9 libinmate non-root [1] | 82 | 266 | 19 libinmate non-root [2] | 36 | 217 | 8 I get the numbers of [1], if I load cache-timings.bin to a fresh created cell, IOW: $ jh cell create my-cell $ jh cell load my-cell cache-timings.bin $ jh cell start my-cell Those numbers can be reproduced if I reload the cell (i.e., w/o destroying it). But in that very same cell, I will get the numbers of [2], if I load/start Linux first and THEN reload the cell with cache-timings.bin (w/o destroyment in between). IOW: $ jh cell load linux my-cell ... $ jh cell start my-cell $ jh cell load my-cell cache-timings.bin $ jh cell start my-cell Interesting. This means that Linux must have left some configuration artefacts. Still unclear what exactly. > >> >> Cached Access on Linux is 2x faster than in the non-root cell - if my test is >> correct. This can - probably - explained by different cache configurations. >> Uncached access happens at almost the same speed. >> >> But do you have an explanation why the overhead measurement is more then 2x >> faster on Linux than in the non-root cell? >> > > Not yet, but I need the full picture first. Hope the numbers above help. Thanks! Ralf > > Jan > -- You received this message because you are subscribed to the Google Groups "Jailhouse" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jailhouse-dev/938c741d-ca30-a960-5cbb-ebf1d4b3f4a4%40oth-regensburg.de.
