Hi all,

I have done some speed up tests on some of my code. The deal.II
archives seem to have a few messages about the speed up in the
assembly routines but none that I can see about speed up of the
solver. Some of the results were a surprise to me so I would also like
to confirm whether they are expected.

I performed the tests with the very specific aim of demonstrating that
for the type of problem I am dealing with, (1) the solver generally
requires over 90% of the run time, and hence is the major area that
should be optimized, and (2) that the spatial problem size is slightly
too small to parallelize effectively (multiple time steps are
required). Therefore, I timed just the steps required to initialize
the preconditioner and solve the system. In MPI tests I specifically
excluded the work required to collect data back to "process 0". I also
timed my assembly routine but just to show that it contributes around
2-3% of the total work.

Under these idealized conditions, the Trilinos CG solver with SSOR
preconditioner performed very well, in terms of speed up attained. The
maximum deviation from a linear speed up was 30% for up to 8
processors. For processor 1-4 it was around 10%. (These were measured
on a single xeon 8 core chip. I am waiting for a job with two 4 core
chips to run so that I can show the (expected) performance drop as off
chip communications affects the results. As I said, idealized
conditions.)

My surprise came when using the deal.II CG solver with SSOR as
preconditioner. My results for a single processor took slightly less
than half the time the Trilinos solver required when using one MPI
thread, which is great, but I found virtually no speed up from 1
thread to 8 threads. I (mistakenly?) thought that the deal.II vmult
method was threaded and should have shown at least some speed up. I
don't expect threading in the BLAS library that I used so if the
deal.II vmult method relies heavily on BLAS then the lack of speed up
is maybe not surprising. Can anyone confirm whether I should expect
this lack of speed up?

For the record, I controlled the number of threads deal.II used by
explicitly editing source/base/multithread_info.cc to set n_cpus to my
desired value. This required recompling the library each time, but
that only took about 30 seconds because the change is so minor. It
also meant that my test program needed to be recompiled which would
prevent an automatic script. (The compute nodes do not have access to
necessary header files to recompile programs.) I controlled the number
of threads that TBB used by the method I described here:
http://www.dealii.org/~archiver/dealii/msg05676.html. I also compared
my results obtained using this method to results from the library
compiled with the flag --disable-threads interestingly, the results
took longer (I expected the same runtime or possibly shorter). I think
the reason for this is because I was using a version of the library
compiled with Trilinos and PETSc support so the library was bulkier
overall.

I also used UMFPACK to solve the system. On the xeon 8 core chip,  the
deal.II CG solver + SSOR preconditioner beat UMFPACK by about 20% when
I reinitialized UMFPACK every time I needed to solve the matrix. On my
laptop the opposite occurs and UMFPACK beats CG by about 16%. I would
expect the difference lies in the versions of BLAS I am using on the
different machines. When I only initialize the UMFPACK matrix once and
then use it for the remainder of the time steps (which my test case
allows, but in general cannot be done) UMFPACK is an order of
magnitude faster than the rest, perhaps unsurprisingly.

Overall, I found that UMFPACK and CG + SSOR were competitive for my
system. Trilinos showed the best scaling, but only caught up to the
others for 4 processes. This is definitely because my spatial problem,
at 90 000 dofs in a 2D scalar problem, is too small to really justify
parallelizing. I would appreciate it if someone could confirm that I
shouldn't expect speedup from the deal.II solver.

Hope this is of interest to someone, somewhere down the line and
thanks in advance for any thoughts.

Cheers,
Michael
_______________________________________________
dealii mailing list http://poisson.dealii.org/mailman/listinfo/dealii

Reply via email to