Slow performance on Trilinos? -- redux

jtg Tue, 15 Sep 2009 12:29:58 -0700

Dear All,

I apologize if this e-mail seems like resurrecting a dead horse, only
to flog the poor beastie one more time.  We certainly would appreciate
knowing if anyone has had more insight on this issue.


Back in April, there were about five e-mails on the subject "Slow
performance on Trilinos?", initiated by Sum Thai Wong.  The question
dealt with the increased time required to run the FiPy test suite
using the "--Trilinos" option, as compared with the default PySparse
solvers.  The consensus among the responders was that this was a
consequence of some extra overhead when invoking trilinos, on the
many small tests comprising the sandbox.
(See 
http://search.gmane.org/?query=trilinos+performance&group=gmane.comp.python.fipy)

We recognize that this supposition is entirely reasonable, especially
in light of the fact that some right-thinking users do indeed experience
a substantial performance improvement. Unfortunately we see a slow-
down, similar in magnitude -- around 30% -- to that reported by Sum
Thai Wong in our attempt at using FiPy with trilinos to solve a set of
phase field equations.  We are using a Debian release with Linux
kernel 2.6.26 on a machine with four amd64 processors.  There has
been NO attempt to parallelize the code; in fact, to eliminate any
possible slowdown due to interaction with openMPI, we built trilinos
in its serial incarnation, as demonstrated in our trilinos configure
command

/configure \
CXXFLAGS="-O3" CFLAGS="-O3" FFLAGS="-O5 -funroll-all-loops -malign-double" \
--prefix=/.../TRILINOS_SERIAL \
--cache-file=config.cache \
--with-cxxflags=-fPIC --with-cflags=-fPIC --with-fflags=-fPIC \
--with-gnumake \
--with-python=/usr/bin/python \
--enable-epetra --enable-aztecoo --enable-pytrilinos --enable-ml \
--enable-ifpack --enable-amesos --enable-galeri

Our FiPy runs consist of hundreds of thousands of iterations, and
in my opinion the performance hit that we see can not be set down
to the logistical or administrative details that may explain the additional
time for the completion of the test suite scripts.

A 4x speed-up, as others have reported, would bring a gleam to our
weary eyes and a ray of sunshine to our dreary researches, since
at present each datapoint takes a day to collect.  Thanks for any sug-
gestions or advice... especially the useful kind that solves the problem.

Regards,
J. Gathright


P.S. Just curious:  would it be difficult to measure the time the tests
spend in the critical portions of the tests?  If the value reported by
"setup.py test" is only elapsed clock time and non-solver housekeeping
segments of the scripts can significantly influence the results, it may be
helpful to have a figure of merit targeted toward the really crucial code
blocks.

Slow performance on Trilinos? -- redux

Reply via email to