On Mon, 2008-06-02 at 14:55 -0700, ron minnich wrote:
> now I time the run 10 times (I can run longer but it seems good enough
> to establish behavior). I should get some rough idea of the cost of
> the branch.

I hate to say it: but these days you can't time anything in isolation.
The CPU is just too complex. It is no longer an x86, really. The
microcode is what you have to worry about. And how the microinstructions
get scheduled, and how they interact with each other, etc. etc.
You wouldn't believe the kind of crazy stuff that sometimes you see in
the x86 compiler optimization team here at Sun. So, unless your
actual code is exactly like the benchmark you quoted -- the results
could be completely off. I talk from experience. The experience of SPEC.

Thanks,
Roman.

P.S. With the modern CPUs even the most basic of questions could turn
out to be quite surprising. I'm still shocked that even by utilizing 
some of the close ties to Intel/AMD I wasn't able to find an
*authoritative* source on information on a very basic thing: cache
architecture for modern x86 CPUs. Well, everybody assume that they
know:
   http://people.redhat.com/drepper/cpumemory.pdf
how the cache operates (e.g. what is the "hash function" for the
virtual/physical address, etc.) but none of the documentation
from Intel/AMD explicitly confirms that.


Reply via email to