On 03.10.20 01:56, Ralf Ramsauer wrote:
> On x86_64 systems, this test inmate measures the time that is required
> to read a value from main memory. Via rdtsc, it measures the CPU cycles
> that are required for the access. Acces can either happen cached, or
> uncached. In case of uncached access, the cache line will be flushed
> before access.
> 
> This tool repeats the measurement for 10e6 times, and outputs the
> average cycles that were required for the access. Before accessing the
> actual measurement, a dummy test is used to determine the average
> overhead of one single measurement.
> 
> And that's pretty useful, because this tool gives a lot of insights of
> differences between the root and the non-root cell: With tiny effort, we
> can also run it on Linux.
> 
> If the 'overhead' time differs between root and non-root cell, this can
> be an indicator that there might be some timing or speed differences
> between the root and non-root cell.
> 
> If the 'uncached' or 'cached' average time differs between the non-root
> and root cell, it's an indicator that both might have different hardware
> configurations / setups.
> 
> The host tool can be compiled with:
> $ gcc -Os -Wall -Wextra -fno-stack-protector -mno-red-zone -o cache-timing 
> ./inmates/tests/x86/cache-timings-host.c
> 
> Signed-off-by: Ralf Ramsauer <[email protected]>
> ---
> 
> Hi Jan,
> 
> what do you think about a test inmate like this one? It's still a RFC patch, 
> as
> I'm not sure if the measurement setup is correct. Especially I might have too
> much fences.
> 
> This test could be extended to run permanently and show the results of the 
> last
> 1e3, 1e5 and 1e6 runs. Having this, this tool could be used to monitor
> influences of the root cell on the non-root cell's caches.

Such benchmarks aren't bad. However, the current form does not qualify
for the test folder yet IMHO: no functional test, no easy evaluation of
benchmark results in order to generate a pass/fail criteria.

> 
> 
> Aaand btw: On a Xeon Gold 5118, we have following values on Linux resp. in the
> non-root cell:
> 
> Linux:
> $ ./cache-timing
> Measurement rounds: 10000000
> Determining measurement overhead...
>   -> Average measurement overhead: 37 cycles
> Measuring uncached memory access...
>   -> Average uncached memory access: 222 cycles
> Measuring cached memory access...
>   -> Average cached memory access: 9 cycles
> 

Linux native or Linux in Jailhouse?

> Non-Root:
> Cell "apic-demo" can be loaded
> Started cell "apic-demo"
> CPU 3 received SIPI, vector 100
> Measurement rounds: 10000000
> Determining measurement overhead...
>   -> Average measurement overhead: 82 cycles
> Measuring uncached memory access...
>   -> Average uncached memory access: 247 cycles
> Measuring cached memory access...
>   -> Average cached memory access: 19 cycles

How does this compare to Linux in Jailhouse (if the above was native)?

> 
> Cached Access on Linux is 2x faster than in the non-root cell - if my test is
> correct. This can - probably - explained by different cache configurations.
> Uncached access happens at almost the same speed.
> 
> But do you have an explanation why the overhead measurement is more then 2x
> faster on Linux than in the non-root cell?
> 

Not yet, but I need the full picture first.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jailhouse-dev/2b5258fb-ff56-c74d-08c8-2cbd22c8478f%40siemens.com.

Reply via email to