> This is with the main program / SUNDIALS compiled with -O2, and the 
> generated code compiled with -O3 -ffast-math. I have disassembled the 
> computation functions in the -O3 -ffast-math code, and it looks 
> reasonable, there are no CALL instructions anymore (the built-in exp and 
> log from gcc get inlined). I therefore doubt that differences in the 
> quality of the generated code is the cause of the problem. It is 
> possible that Alan has managed to get the better benchmark by compiling 
> CVODE with -O3 -ffast-math or other optimisations.
Turning -O3 when compiling SUNDIALS actually makes it worse, presumably 
because it increases the code size and therefore the number of cache 
misses. Compiling everything with -O3 -fomit-frame-pointer -ffast-math gave
real    26m40.323s
user    26m35.056s
sys     0m2.660s

Recompiling everything with -O2 -fomit-frame-pointer -ffast-math:
real    25m4.259s
user    24m58.534s
sys     0m3.040s

> Another possibility 
> would be that his CellML 1.0 Ten Tuscher model behaves differently. Yet 
> another possibility would be that the differences could be arising from 
> the structure of the CVODE stepping loop, or differences in some 
> parameters given to the solver.  My stepping loop looks like this: 
> https://svn.physiomeproject.org/svn/physiome/CellML_DOM_API/trunk/CIS/sources/CISSolve.cxx,
> see function SolveODEProblemCVODE.
I have run the version of the model from Alan with the integrator code 
compiled using -O2 -fomit-frame-pointer -ffast-math (generated code 
compiled -O3 -ffast-math), and I got the following...

real    22m25.521s
user    22m21.168s
sys     0m2.400s

I will look into where the time is being spent, to see if this can be 

Best regards,

cellml-discussion mailing list

Reply via email to