> I did some tests with Tasks, using a taskgroup to solve 8 systems with > around 60000 dofs in parallel, using a incomplete LU decomposition. Each > task has to use its own solver instance, otherwise everything fails. The CG > solver takes around 5 iterations. However I only observe a speedup of <= 10 > % over the serial solution, which I find a bit disappointing. > > Could this be due to a memory bottleneck? And if this is the case, how are > the chances to get better results with Trilinos or PETSc?
I would assume so. From your description, your problem is heavily dominated by memory access. Can't you do the multiplication with L before the solver? Best, Guido _______________________________________________ dealii mailing list http://poisson.dealii.org/mailman/listinfo/dealii
