> This is with the main program / SUNDIALS compiled with -O2, and the > generated code compiled with -O3 -ffast-math. I have disassembled the > computation functions in the -O3 -ffast-math code, and it looks > reasonable, there are no CALL instructions anymore (the built-in exp and > log from gcc get inlined). I therefore doubt that differences in the > quality of the generated code is the cause of the problem. It is > possible that Alan has managed to get the better benchmark by compiling > CVODE with -O3 -ffast-math or other optimisations. Turning -O3 when compiling SUNDIALS actually makes it worse, presumably because it increases the code size and therefore the number of cache misses. Compiling everything with -O3 -fomit-frame-pointer -ffast-math gave real 26m40.323s user 26m35.056s sys 0m2.660s
Recompiling everything with -O2 -fomit-frame-pointer -ffast-math: real 25m4.259s user 24m58.534s sys 0m3.040s > Another possibility > would be that his CellML 1.0 Ten Tuscher model behaves differently. Yet > another possibility would be that the differences could be arising from > the structure of the CVODE stepping loop, or differences in some > parameters given to the solver. My stepping loop looks like this: > https://svn.physiomeproject.org/svn/physiome/CellML_DOM_API/trunk/CIS/sources/CISSolve.cxx, > > see function SolveODEProblemCVODE. > I have run the version of the model from Alan with the integrator code compiled using -O2 -fomit-frame-pointer -ffast-math (generated code compiled -O3 -ffast-math), and I got the following... real 22m25.521s user 22m21.168s sys 0m2.400s I will look into where the time is being spent, to see if this can be improved. Best regards, Andrew _______________________________________________ cellml-discussion mailing list [email protected] http://www.cellml.org/mailman/listinfo/cellml-discussion
