On Sun Sep 4 13:48:31 EDT 2011, [email protected] wrote:
> after the recent discussions on nsec()...
>
> does anyone already have the snippet of code to do fine grained
> timeings on the x86 platform using the hardware performance counters?
>
> I would use nsec() but I'am timing systemcalls so I expect my results
> would be swamped by nsec()'s performance.
i wrote up a little demo using a varient of nsec and
using the x86 cycle counter, RDTSC.
the source is in /n/sources/contrib/quanstro/highprec.
i'd recommend doing timings on your particular hardware.
here are my results:
; aux/cpuid -i
AMD Phenom(tm) II X4 965 Processor
; 8.out
nsec latency 25729ns
nsec latency 24554ns
cycle hz = 3393000000
cycles latency 88 cycles; 25 ns
cycles latency 78 cycles; 22 ns
ladd; aux/cpuid -i
Intel(R) Atom(TM) CPU 330 @ 1.60GHz
ladd; 8.out
nsec latency 39501ns
nsec latency 38901ns
cycle hz = 1604000000
cycles latency 60 cycles; 37 ns
cycles latency 48 cycles; 29 ns
new; aux/cpuid -i
Intel(R) Xeon(R) CPU E31220 @ 3.10GHz
new; 8.out
nsec latency 8591ns
nsec latency 9155ns
cycle hz = 3105000000
cycles latency 28 cycles; 9 ns
cycles latency 28 cycles; 9 ns
chula; aux/cpuid -i
Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
chula; 8.out
nsec latency 14319ns
nsec latency 14451ns
cycle hz = 2660000000
cycles latency 40 cycles; 15 ns
cycles latency 32 cycles; 12 ns
it seems like you can get ±10ns at a few 10s of
ns latency with _cycles and ±10µs at a few 10s
of µs latency with /dev/bintime.
- erik