Hello everybody,

for one part of a calculation I have to solve several (about 6-15) linear systems of the form

M x_i = L_i b

M is the Mass matrix, the Matrix L_i is the discretisation of an integral operator and is therefore denser than M. The L_i can take several hundert MB each in some cases. The size of the problems is moderate, ranging from several ten to few hundred thousand dofs.

This step takes a significant part of the time Now I was thinking whether it might be possible to do make better use of the 4 cores of my computer. As these linear systems are independent, it should be possible to solve them in parallel. I do not intend to distribute this calculation to multiple machines. However I will get access to a machine with 12 cores in the next few weeks.

In principle I see several different possibilities how to do that. One could either use tasks or threads to solve the linear systems simultaneously, or use Trilinos or PETSc to solve them one after another, but using multiple MPI Processes.

I did some tests with Tasks, using a taskgroup to solve 8 systems with around 60000 dofs in parallel, using a incomplete LU decomposition. Each task has to use its own solver instance, otherwise everything fails. The CG solver takes around 5 iterations. However I only observe a speedup of <= 10 % over the serial solution, which I find a bit disappointing.

Could this be due to a memory bottleneck? And if this is the case, how are the chances to get better results with Trilinos or PETSc?

Thank you very much for your efforts

Johannes



_______________________________________________
dealii mailing list http://poisson.dealii.org/mailman/listinfo/dealii

Reply via email to