Re: [HelenOS-devel] How to display/interpret benchmark results?

Vojtech Horky Fri, 26 Jan 2024 08:34:54 -0800

Hi, Jiri,

út 23. 1. 2024 v 13:48 odesílatel Jiri Svoboda <jirik.svob...@seznam.cz> napsal:
> Currently hbench selects the number of cycles, takes 10 time measurements. It 
> then computes the average and standard deviation of the time taken. It then 
> computes and displays the rate from the average time and cycle count.
>
> I am really interested in the rate, not the time. What I am missing is a 
> number on the spread of the rate.
> What is the correct way of doing this?
> Is it correct to compute the rate from the average time? Or should we average 
> the rates and get SD from there?

I am not an expert but I have my $0.02 anyway.

First of all, why do we need to compute the rate at all? What is wrong
with displaying wall-clock time? I guess it is more natural for
multithreaded benchmarks where you want to take into account the
overall throughput across many CPUs but I am not sure if it applies
here?

Anyway, I guess the best would be to use harmonic mean when talking
about rates. And I have stumbled across this question [1] which may
apply to us as well?

[1]
https://stats.stackexchange.com/questions/7471/can-the-standard-deviation-be-calculated-for-harmonic-mean

And now for the opinionated part :-).

I would humbly suggest against displaying only the average values. It
would be really great if there would an option to dump all the samples
to a file for later analysis (i.e., outside of HelenOS). When the
difference is in order of magnitude, displaying the average only is
probably fine. But for smaller differences one might want to use some
kind of statistical test (t-test is fine with mean and SD but
Mann-Whitney needs the individual samples if I remember correctly) or
compute confidence intervals (e.g. via bootstrap). And for these
knowing the individual samples is a must.

I really like how Renaissance benchmark suite does that: you can dump
all the data into JSON and it contains information about measurements
as well as about the environment. I am unable to find an online
example but this [2] Zenodo record contains raw JSON dumps too where
you can see several examples of the dumps (they are very JVM-oriented
but I like the general concept of storing environment information,
data format version and the actual measurements in one file).

[2] https://zenodo.org/records/4492935

As a matter of fact I think we should dump as much (raw) information
as we have. It happened to me many times already that I had to
re-measure something because I forgot to take down some information
about the environment. Be it precise kernel version, IP address or L3
cache size. In HelenOS we do not have that much information available
but my experience is that one should not throw away any information.

But because I am not actually implementing it it is very easy to come
up with ideas that mean work for someone else. I think using the right
type of mean is the only relevant part. Please, feel free to ignore the
rest :-)

Cheers,
- Vojtech

_______________________________________________
HelenOS-devel mailing list
HelenOS-devel@lists.modry.cz
http://lists.modry.cz/listinfo/helenos-devel

Re: [HelenOS-devel] How to display/interpret benchmark results?

Reply via email to