Dear All, I apologize if this e-mail seems like resurrecting a dead horse, only to flog the poor beastie one more time. We certainly would appreciate knowing if anyone has had more insight on this issue.
Back in April, there were about five e-mails on the subject "Slow performance on Trilinos?", initiated by Sum Thai Wong. The question dealt with the increased time required to run the FiPy test suite using the "--Trilinos" option, as compared with the default PySparse solvers. The consensus among the responders was that this was a consequence of some extra overhead when invoking trilinos, on the many small tests comprising the sandbox. (See http://search.gmane.org/?query=trilinos+performance&group=gmane.comp.python.fipy) We recognize that this supposition is entirely reasonable, especially in light of the fact that some right-thinking users do indeed experience a substantial performance improvement. Unfortunately we see a slow- down, similar in magnitude -- around 30% -- to that reported by Sum Thai Wong in our attempt at using FiPy with trilinos to solve a set of phase field equations. We are using a Debian release with Linux kernel 2.6.26 on a machine with four amd64 processors. There has been NO attempt to parallelize the code; in fact, to eliminate any possible slowdown due to interaction with openMPI, we built trilinos in its serial incarnation, as demonstrated in our trilinos configure command /configure \ CXXFLAGS="-O3" CFLAGS="-O3" FFLAGS="-O5 -funroll-all-loops -malign-double" \ --prefix=/.../TRILINOS_SERIAL \ --cache-file=config.cache \ --with-cxxflags=-fPIC --with-cflags=-fPIC --with-fflags=-fPIC \ --with-gnumake \ --with-python=/usr/bin/python \ --enable-epetra --enable-aztecoo --enable-pytrilinos --enable-ml \ --enable-ifpack --enable-amesos --enable-galeri Our FiPy runs consist of hundreds of thousands of iterations, and in my opinion the performance hit that we see can not be set down to the logistical or administrative details that may explain the additional time for the completion of the test suite scripts. A 4x speed-up, as others have reported, would bring a gleam to our weary eyes and a ray of sunshine to our dreary researches, since at present each datapoint takes a day to collect. Thanks for any sug- gestions or advice... especially the useful kind that solves the problem. Regards, J. Gathright P.S. Just curious: would it be difficult to measure the time the tests spend in the critical portions of the tests? If the value reported by "setup.py test" is only elapsed clock time and non-solver housekeeping segments of the scripts can significantly influence the results, it may be helpful to have a figure of merit targeted toward the really crucial code blocks.
