On 9/25/11 11:56 PM, Robert Jacques wrote:
On Sun, 25 Sep 2011 23:06:01 -0400, Andrei Alexandrescu
<[email protected]> wrote:
So, std.benchmark should
[ ] Set the affinity to a single thread at start (i.e. use
SetProcessAffinityMask, etc)

I think that won't influence most measurements measurably.

First, a core jump is guaranteed to, essentially, poison the cache.
Which, at least in my eyes, invalidates that particular timing run.

Well it will be one of many runs. The questions are if (a) core jumps are likely to happen during a benchmark run, (b) many enough will happen to influence a 500ms time window.

Second, timing generally relies on the CPUs Time Stamp Counter, which is
not multi-thread safe; a core switch invalidates all previous TSC
values, and hence, the time measurement itself. Furthermore, the TSC is
not even guaranteed to have a fixed frequency on some CPUs. Now there
are ways around the problems of the TSC, but even so:

(From the Wikipedia)
"Under Windows platforms, Microsoft strongly discourages using the TSC
for high-resolution timing for exactly these reasons, providing instead
the Windows APIs QueryPerformanceCounter and
QueryPerformanceFrequency.[2] Even when using these functions, Microsoft
recommends the code to be locked to a single CPU."

std.benchmark uses QueryPerformanceCounter on Windows and clock_gettime/gettimeofday on Unix.

[ ] Repeat the measurement N times, taking the min.

std.benchmark repeats the measurement in a loop, discounts the times
spent in the iteration proper, and divides total time by the number of
iterations to figure the time per iteration. This has the advantage that
works with even very fast functions without letting the counter itself
affect the timing.

Which is necessity but not sufficient for proper benchmarking. The
timers themselves are noisy, to say nothing of the effects of context
switches, warm-up, etc on a run. During the ptr+length vs ptr+ptr
discussion on Tuesday, the naive use benchmark lead to some very wrong
conclusions simply because any single run had up to ~5% of error. The
only way to get rid of this error, is to make multiple runs.

You may want to take a look at the code. This is a long argument ending with "...make multiple runs" as if std.benchmark is not doing them and you suggest it should. Again, std.benchmark runs the tested function in a loop in powers of 10 until the entire run lasts at least 500ms.

[X] Adaptively increase the number of loop iterations to ensure a valid
reading.

The current system tries 10, 100, 1000 iterations until it gets to a
total run time of at least 0.5 seconds. That's quite a generous margin,
I plan to tighten it.

[ ] Adaptively decrease the number of loop iterations to ensure minimal
context switching.

OK.

In fact upon further thinking the iterations should not be too few. Context switches and other activity is inevitable, but the effects are negligible over long periods (as half a second would be).

To conclude, I'm unsure what steps you suggest to take to improve std.benchmark, and I'd be grateful if you could clarify them after perusing its code and documentation.


Thanks,

Andrei

Reply via email to