Hi, Am 21.10.2017 um 21:41 schrieb Wolf: > rdtsc cannot do it either. You need to have a CPU capable of > understanding rdtscp. From what I understood, that doesn't give you cycles either, but only the same timestamp intervals RDTSC returns. Tangent: On Windows, RDTSC is wrapped by the QueryPerformanceCounter() call. QPC incidentally is complicated enough that it is very likely no out-of-order instructions are pending by the time it gets to actually executing RDTSC, but with less jitter than "abusing" CPUID for serialisation (QPC takes a rather stable 22 cycles). /Tangent
What I do to get cycle-accurate counts in microbenchmarks is first calibrate the timer with a known-cycle-length task, obtain a "timestamps per cylce" from that (depending on core clock, usually between 200 and 1300), and then measure the actual function. Note that depending on how well you can control multitasking, interrupts and power management, you will need thousands to millions of repeats of the function under test to be reasonably free from artefacts. Looks a bit statistical, but it's precise enough to actually see the instruction cache, instruction alignment and branch prediction at work. I can also validate Agner Fog's Instruction Timing Tables with it, so it can't be that bad ;-) -- Regards, Martok Ceterum censeo b32079 esse sanandam. _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel