Hi,

Am 21.10.2017 um 21:41 schrieb Wolf:
> rdtsc cannot do it either. You need to have a CPU capable of 
> understanding rdtscp.
From what I understood, that doesn't give you cycles either, but only the same
timestamp intervals RDTSC returns.
Tangent: On Windows, RDTSC is wrapped by the QueryPerformanceCounter() call. QPC
incidentally is complicated enough that it is very likely no out-of-order
instructions are pending by the time it gets to actually executing RDTSC, but
with less jitter than "abusing" CPUID for serialisation (QPC takes a rather
stable 22 cycles). /Tangent

What I do to get cycle-accurate counts in microbenchmarks is first calibrate the
timer with a known-cycle-length task, obtain a "timestamps per cylce" from that
(depending on core clock, usually between 200 and 1300), and then measure the
actual function. Note that depending on how well you can control multitasking,
interrupts and power management, you will need thousands to millions of repeats
of the function under test to be reasonably free from artefacts.
Looks a bit statistical, but it's precise enough to actually see the instruction
cache, instruction alignment and branch prediction at work. I can also validate
Agner Fog's Instruction Timing Tables with it, so it can't be that bad ;-)

-- 
Regards,
Martok

Ceterum censeo b32079 esse sanandam.

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to