Hi all, I have done some speed up tests on some of my code. The deal.II archives seem to have a few messages about the speed up in the assembly routines but none that I can see about speed up of the solver. Some of the results were a surprise to me so I would also like to confirm whether they are expected.
I performed the tests with the very specific aim of demonstrating that for the type of problem I am dealing with, (1) the solver generally requires over 90% of the run time, and hence is the major area that should be optimized, and (2) that the spatial problem size is slightly too small to parallelize effectively (multiple time steps are required). Therefore, I timed just the steps required to initialize the preconditioner and solve the system. In MPI tests I specifically excluded the work required to collect data back to "process 0". I also timed my assembly routine but just to show that it contributes around 2-3% of the total work. Under these idealized conditions, the Trilinos CG solver with SSOR preconditioner performed very well, in terms of speed up attained. The maximum deviation from a linear speed up was 30% for up to 8 processors. For processor 1-4 it was around 10%. (These were measured on a single xeon 8 core chip. I am waiting for a job with two 4 core chips to run so that I can show the (expected) performance drop as off chip communications affects the results. As I said, idealized conditions.) My surprise came when using the deal.II CG solver with SSOR as preconditioner. My results for a single processor took slightly less than half the time the Trilinos solver required when using one MPI thread, which is great, but I found virtually no speed up from 1 thread to 8 threads. I (mistakenly?) thought that the deal.II vmult method was threaded and should have shown at least some speed up. I don't expect threading in the BLAS library that I used so if the deal.II vmult method relies heavily on BLAS then the lack of speed up is maybe not surprising. Can anyone confirm whether I should expect this lack of speed up? For the record, I controlled the number of threads deal.II used by explicitly editing source/base/multithread_info.cc to set n_cpus to my desired value. This required recompling the library each time, but that only took about 30 seconds because the change is so minor. It also meant that my test program needed to be recompiled which would prevent an automatic script. (The compute nodes do not have access to necessary header files to recompile programs.) I controlled the number of threads that TBB used by the method I described here: http://www.dealii.org/~archiver/dealii/msg05676.html. I also compared my results obtained using this method to results from the library compiled with the flag --disable-threads interestingly, the results took longer (I expected the same runtime or possibly shorter). I think the reason for this is because I was using a version of the library compiled with Trilinos and PETSc support so the library was bulkier overall. I also used UMFPACK to solve the system. On the xeon 8 core chip, the deal.II CG solver + SSOR preconditioner beat UMFPACK by about 20% when I reinitialized UMFPACK every time I needed to solve the matrix. On my laptop the opposite occurs and UMFPACK beats CG by about 16%. I would expect the difference lies in the versions of BLAS I am using on the different machines. When I only initialize the UMFPACK matrix once and then use it for the remainder of the time steps (which my test case allows, but in general cannot be done) UMFPACK is an order of magnitude faster than the rest, perhaps unsurprisingly. Overall, I found that UMFPACK and CG + SSOR were competitive for my system. Trilinos showed the best scaling, but only caught up to the others for 4 processes. This is definitely because my spatial problem, at 90 000 dofs in a 2D scalar problem, is too small to really justify parallelizing. I would appreciate it if someone could confirm that I shouldn't expect speedup from the deal.II solver. Hope this is of interest to someone, somewhere down the line and thanks in advance for any thoughts. Cheers, Michael _______________________________________________ dealii mailing list http://poisson.dealii.org/mailman/listinfo/dealii
