On Feb 7 2009, Jeff Squyres wrote:
On Feb 7, 2009, at 12:23 PM, Brian W. Barrett wrote:
That is significantly higher than I would have expected for a single
function call. When I did all the component tests a couple years
ago, a function call into a shared library was about 5ns on an Intel
Xeon (pre-Core 2 design) and about 2.5 on an AMD Opteron.
Good; I'm not crazy for thinking that this is a little too obvious --
it smells like I did something wrong. Could someone eyeball these
files and see if I missed anything obvious:
At the risk of telling grandmothers how to suck eggs, have you tried
with with different compilers, different systems and/or adding a few
irrelevant (but not optimisable-out) declarations or statements?
That sort of phenomenon is exactly what happens when you trip over a
cache problem - e.g. running out of cache associativity. It can also
occur because of pipeline drain (e.g. branch misprediction) problems.
Neither of those would be found by eyeballing the code - you would at
least have to eyeball the assembler.
Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email: n...@cam.ac.uk
Tel.: +44 1223 334761 Fax: +44 1223 334679