On 04/10/2020 22:16, Ralf Ramsauer wrote:
> On 10/4/20 8:38 PM, Jan Kiszka wrote:
>> On 03.10.20 01:56, Ralf Ramsauer wrote:
>>> On x86_64 systems, this test inmate measures the time that is required
>>> to read a value from main memory. Via rdtsc, it measures the CPU cycles
>>> that are required for the access. Acces can either happen cached, or
>>> uncached. In case of uncached access, the cache line will be flushed
>>> before access.
>>>
>>> This tool repeats the measurement for 10e6 times, and outputs the
>>> average cycles that were required for the access. Before accessing the
>>> actual measurement, a dummy test is used to determine the average
>>> overhead of one single measurement.
>>>
>>> And that's pretty useful, because this tool gives a lot of insights of
>>> differences between the root and the non-root cell: With tiny effort, we
>>> can also run it on Linux.
>>>
>>> If the 'overhead' time differs between root and non-root cell, this can
>>> be an indicator that there might be some timing or speed differences
>>> between the root and non-root cell.
>>>
>>> If the 'uncached' or 'cached' average time differs between the non-root
>>> and root cell, it's an indicator that both might have different hardware
>>> configurations / setups.
>>>
>>> The host tool can be compiled with:
>>> $ gcc -Os -Wall -Wextra -fno-stack-protector -mno-red-zone -o cache-timing 
>>> ./inmates/tests/x86/cache-timings-host.c
>>>
>>> Signed-off-by: Ralf Ramsauer <[email protected]>
>>> ---
>>>
>>> Hi Jan,
>>>
>>> what do you think about a test inmate like this one? It's still a RFC 
>>> patch, as
>>> I'm not sure if the measurement setup is correct. Especially I might have 
>>> too
>>> much fences.
>>>
>>> This test could be extended to run permanently and show the results of the 
>>> last
>>> 1e3, 1e5 and 1e6 runs. Having this, this tool could be used to monitor
>>> influences of the root cell on the non-root cell's caches.
>>
>> Such benchmarks aren't bad. However, the current form does not qualify
>> for the test folder yet IMHO: no functional test, no easy evaluation of
>> benchmark results in order to generate a pass/fail criteria.
> 
> Ack, will move it to demos/. Before posting a v2: Did you have the
> chance to look at the usage of the fences? I'm pretty sure that I might
> have messed up something.
> 
>>
>>>
>>>
>>> Aaand btw: On a Xeon Gold 5118, we have following values on Linux resp. in 
>>> the
>>> non-root cell:
>>>
>>> Linux:
>>> $ ./cache-timing
>>> Measurement rounds: 10000000
>>> Determining measurement overhead...
>>>   -> Average measurement overhead: 37 cycles
>>> Measuring uncached memory access...
>>>   -> Average uncached memory access: 222 cycles
>>> Measuring cached memory access...
>>>   -> Average cached memory access: 9 cycles
>>>
>>
>> Linux native or Linux in Jailhouse?
>>
>>> Non-Root:
>>> Cell "apic-demo" can be loaded
>>> Started cell "apic-demo"
>>> CPU 3 received SIPI, vector 100
>>> Measurement rounds: 10000000
>>> Determining measurement overhead...
>>>   -> Average measurement overhead: 82 cycles
>>> Measuring uncached memory access...
>>>   -> Average uncached memory access: 247 cycles
>>> Measuring cached memory access...
>>>   -> Average cached memory access: 19 cycles
>>
>> How does this compare to Linux in Jailhouse (if the above was native)?
> 
> Ok, the following table shows the three numbers for
> overhead / uncached / cached:
> 
> Measurement            | OH |  U$ | $
> -----------------------+----+-----+-----
> Linux native           | 37 | 222 |  9
> Linux root             | 37 | 226 |  9
> Linux non-root         | 37 | 215 |  9
> libinmate non-root [1] | 82 | 266 | 19
> libinmate non-root [2] | 36 | 217 |  8

Okay, fasten seatbelts, here's another one:

$ jh cell create my-cell
$ jh cell load my-cell apic-demo.bin
$ jh cell start my-cell
[snip]
Timer fired, jitter:    728 ns, min:    655 ns, max:    899 ns

And that one:
$ jh cell linux my-cell [...]
$ jh cell load my-cell apic-demo.bin
$ jh cell start my-cell
[snip]
Timer fired, jitter:    332 ns, min:    267 ns, max:    461 ns

Wow.

  Ralf

> 
> I get the numbers of [1], if I load cache-timings.bin to a fresh created
> cell, IOW:
> 
> $ jh cell create my-cell
> $ jh cell load my-cell cache-timings.bin
> $ jh cell start my-cell
> 
> Those numbers can be reproduced if I reload the cell (i.e., w/o
> destroying it). But in that very same cell, I will get the numbers of
> [2], if I load/start Linux first and THEN reload the cell with
> cache-timings.bin (w/o destroyment in between). IOW:
> 
> $ jh cell load linux my-cell ...
> $ jh cell start my-cell
> $ jh cell load my-cell cache-timings.bin
> $ jh cell start my-cell
> 
> Interesting. This means that Linux must have left some configuration
> artefacts. Still unclear what exactly.
> 
>>
>>>
>>> Cached Access on Linux is 2x faster than in the non-root cell - if my test 
>>> is
>>> correct. This can - probably - explained by different cache configurations.
>>> Uncached access happens at almost the same speed.
>>>
>>> But do you have an explanation why the overhead measurement is more then 2x
>>> faster on Linux than in the non-root cell?
>>>
>>
>> Not yet, but I need the full picture first.
> 
> Hope the numbers above help.
> 
> Thanks!
>   Ralf
> 
>>
>> Jan
>>
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jailhouse-dev/536f0af0-f82d-a5a7-4d2f-8a7a73537c04%40oth-regensburg.de.

Reply via email to