On 03.10.20 01:56, Ralf Ramsauer wrote: > On x86_64 systems, this test inmate measures the time that is required > to read a value from main memory. Via rdtsc, it measures the CPU cycles > that are required for the access. Acces can either happen cached, or > uncached. In case of uncached access, the cache line will be flushed > before access. > > This tool repeats the measurement for 10e6 times, and outputs the > average cycles that were required for the access. Before accessing the > actual measurement, a dummy test is used to determine the average > overhead of one single measurement. > > And that's pretty useful, because this tool gives a lot of insights of > differences between the root and the non-root cell: With tiny effort, we > can also run it on Linux. > > If the 'overhead' time differs between root and non-root cell, this can > be an indicator that there might be some timing or speed differences > between the root and non-root cell. > > If the 'uncached' or 'cached' average time differs between the non-root > and root cell, it's an indicator that both might have different hardware > configurations / setups. > > The host tool can be compiled with: > $ gcc -Os -Wall -Wextra -fno-stack-protector -mno-red-zone -o cache-timing > ./inmates/tests/x86/cache-timings-host.c > > Signed-off-by: Ralf Ramsauer <[email protected]> > --- > > Hi Jan, > > what do you think about a test inmate like this one? It's still a RFC patch, > as > I'm not sure if the measurement setup is correct. Especially I might have too > much fences. > > This test could be extended to run permanently and show the results of the > last > 1e3, 1e5 and 1e6 runs. Having this, this tool could be used to monitor > influences of the root cell on the non-root cell's caches.
Such benchmarks aren't bad. However, the current form does not qualify for the test folder yet IMHO: no functional test, no easy evaluation of benchmark results in order to generate a pass/fail criteria. > > > Aaand btw: On a Xeon Gold 5118, we have following values on Linux resp. in the > non-root cell: > > Linux: > $ ./cache-timing > Measurement rounds: 10000000 > Determining measurement overhead... > -> Average measurement overhead: 37 cycles > Measuring uncached memory access... > -> Average uncached memory access: 222 cycles > Measuring cached memory access... > -> Average cached memory access: 9 cycles > Linux native or Linux in Jailhouse? > Non-Root: > Cell "apic-demo" can be loaded > Started cell "apic-demo" > CPU 3 received SIPI, vector 100 > Measurement rounds: 10000000 > Determining measurement overhead... > -> Average measurement overhead: 82 cycles > Measuring uncached memory access... > -> Average uncached memory access: 247 cycles > Measuring cached memory access... > -> Average cached memory access: 19 cycles How does this compare to Linux in Jailhouse (if the above was native)? > > Cached Access on Linux is 2x faster than in the non-root cell - if my test is > correct. This can - probably - explained by different cache configurations. > Uncached access happens at almost the same speed. > > But do you have an explanation why the overhead measurement is more then 2x > faster on Linux than in the non-root cell? > Not yet, but I need the full picture first. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux -- You received this message because you are subscribed to the Google Groups "Jailhouse" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jailhouse-dev/2b5258fb-ff56-c74d-08c8-2cbd22c8478f%40siemens.com.
