http://coding.derkeiler.com/Archive/Assembler/comp.lang.asm.x86/2007-01/msg00021.html

Re: Reliable timing using RDTSC



Hi,

Tim Roberts wrote:
"Magnum Innominandum" <[EMAIL PROTECTED]> wrote:
Good point. I do have hyper-threading enabled. I'm not sure how this
will affect the timings, but I will try running my program with HT
disabled and see what happens.

The problem is that Windows XP no longer synchronizes the cycle counters on
multiple CPU machines. The two cycle counters can be millions of cycles
apart. If your thread happens to switch processors during your timing,
which is entirely possible, the time might appear to take a huge jump
forward, or even jump backward. The delta is constant for any given
reboot, but varies from boot to boot.

AFAIK it's worse than that, as (for hyper-threading) the performance of
one logical CPU is effected by the work done by another logical CPU,
and RDTSC counts "shared cycles". For example, if a piece of code takes
100 cycles while the other CPU is in a HLT state, then the same code
might take 200 cycles when the other CPU is also doing work.

AFAIK, for accurate measurement you need to use performance monitoring
counters with front-end tagging. Disabling hyper-threading for the
duration of the tests would be much easier... :-)

Also, recent AMD CPUs have a RDTSCP instruction that is serialising,
and returns the TSC (in EDX:EAX) and a "CPU identifier" (in ECX), so
that you can do RDTSCP twice and compare the counts and check to see if
the same CPU was used. I'm not sure if Windows correctly sets the MSRs
for the "CPU identifiers" though - it may be that RDTSCP always returns
zero in ECX due to lack of OS support (and you probably can't easily
fix this yourself, as you need to run at CPL=0 to set the "TSC_AUX" MSR
on each CPU).

IMHO (in general) there's a conflict with RDTSC in that some people
want it to measure real time (i.e. a fixed frequency counter), while
other people want to use it to measure code performance (or used CPU
cycles). These aren't the same thing due to power management (and other
things - hyper-threading, SMI/SMM, etc). Different power management
mechanisms make the TSC unsuitable for one use or the other (for e.g.
compare the effects of Intel's SpeedStep and Intel's clock modulation
on the TSC).

What I'd like to see is a pair of instructions and a pair of counters
(one for each purpose). For example, a "real time" counter, and a "used
cycle" counter that doesn't involve the messy (model specific)
performance monitoring counters.


Cheers,

Brendan

.



Relevant Pages

  • Re: Atmel releasing FLASH AVR32 ?
    ... A dual thread 40 MHz CPU can replace two 20 MHz CPUs. ... that a thread can only run max 1/2 or 1/3rd of the cycles ... switch at the start of the pipeline, ... equivalent to the interrupt latency. ...
    (comp.arch.embedded)
  • Re: Apple II Disk Drive Question
    ... derived from the Apple II CPU clock which runs at ... which will write one bit every four CPU cycles, ... adjusting the speed of the two drives to create the necessary ... know the rotation speed of both the writing and reading drives, ...
    (comp.sys.apple2)
  • Re: Apple II Disk Drive Question
    ... which will write one bit every four CPU cycles, ... disk is spinning. ... adjusting the speed of the two drives to create the necessary ...
    (comp.sys.apple2)
  • Re: interactive task starvation
    ... Where exactly are those extra cycles going I wonder? ... blows my mind though for reasons I've just said. ... in other processes which are starvating the CPU (eg: ... as no other workload has been identified. ...
    (Linux-Kernel)
  • Re: global rq->clock
    ... Andi said he was working on a fastish global sched_clock ... On some systems it counts cycles, ... I'm not convinced TSC is the right thing for the scheduler in the first ... (software CPU frequency control is only part of that; ...
    (Linux-Kernel)

Reply via email to