I realized at least a bit of the problem. My Trilinos 10.2 build has
-D CMAKE_BUILD_TYPE:STRING=DEBUG \
set (as do many of the example scripts that I cadged this from). Changing this
to
-D CMAKE_BUILD_TYPE:STRING=RELEASE \
institutes some optimizations (and also produces consistent segfaults when
built with MPI support).
With a *serial* RELEASE build, I now get:
total _prepareLinearSystem _solve (i)
precon
PySparse default (PCG/SSOR) 73.0 60.1 8.9
Trilinos default (GMRES/DD) 143.8 57.2 83.7
27.0
Trilinos, PCG, no precon 97.5 55.9 37.7
With optimization, the Trilinos solves are about 30% faster (when I scale the
new and old _prepareLinearSystem and PySparse._solve times). This still leaves
Trilinos in the dust, but its something. Now to figure out why I can't build it
against MPI.
On May 26, 2010, at 5:21 PM, I wrote:
>
>
> On May 25, 2010, at 9:37 PM, Daniel Wheeler wrote:
>
>> Maybe try without any preconditioner or with only Jacobi.
>
> Here's what I get for a variety of configurations (2 runs each, showing
> pretty good run-to-run consistency).
>
> The default solver and preconditioner definitely seem bad, at least for this
> problem. Of this batch, Trilinos' PCG solver with no preconditioner at all is
> preferred, but is still substantially slower than PySparse's PCG.
>
> total _prepareLinearSystem _solve (i)
> precon
> PySparse default (PCG) 71.9 49.4 7.3
> PySparse default 61.6 50.2 7.4
> Trilinos default (ii) 170.6 50.4 116.2
> 42.2
> Trilinos default 168.6 50.3 114.7
> 42.0
> Trilinos, GMRES, no precon (iii) 117.2 50.7 62.4
> Trilinos, GMRES, no precon 116.6 50.2 62.7
> Trilinos, GMRES, Jacobi 123.9 51.0 69.1
> Trilinos, GMRES, Jacobi 120.2 50.2 66.0
> Trilinos, PCG, no precon 101.1 49.9 47.2
> Trilinos, PCG, no precon 101.9 51.2 46.9
>
>
> (iv)
> mpirun -np 2, GMRES, no precon 75.5 29.1 42.5
> mpirun -np 2, PCG, no precon 68.4 31.7 32.6
>
>
>
> (i) includes preconditioning
>
> (ii) GMRES, MultiLevelDDPreconditioner
>
> (iii) obtained with
>
> solver = DefaultSolver(precon=None)
>
> and
>
> phaseEq.solve(phase, dt=dt, solver=solver)
> heatEq.solve(dT, dt=dt, solver=solver)
>
> (iv) profiling results for parallel are dubious. They likely only represent
> results for a single process, at best. Moreover, the .prof file was garbled
> the second time I tried this, for both configurations. Still, they all seem
> to scale consistently with wall-clock time.
>