On Feb 7 2009, Jeff Squyres wrote:
On Feb 7, 2009, at 12:23 PM, Brian W. Barrett wrote:

That is significantly higher than I would have expected for a single function call. When I did all the component tests a couple years ago, a function call into a shared library was about 5ns on an Intel Xeon (pre-Core 2 design) and about 2.5 on an AMD Opteron.

Good; I'm not crazy for thinking that this is a little too obvious -- it smells like I did something wrong. Could someone eyeball these files and see if I missed anything obvious:

At the risk of telling grandmothers how to suck eggs, have you tried
with with different compilers, different systems and/or adding a few
irrelevant (but not optimisable-out) declarations or statements?

That sort of phenomenon is exactly what happens when you trip over a
cache problem - e.g. running out of cache associativity.  It can also
occur because of pipeline drain (e.g. branch misprediction) problems.
Neither of those would be found by eyeballing the code - you would at
least have to eyeball the assembler.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  n...@cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679


Reply via email to