My idea was to observe timing behaviour similar to step-55, as the preconditioner template I have followed most closely emulates the behaviour of the implementation in step-55. The number of degrees of freedom are not significantly different from those observed at cycles 4 and 5. The output I observe from cycles 4 and 5 when i run step-55 using 2 processes with the same version of deal.II, I get the performance as shown below. This indicated to me that for 2 mpi processes and the number of dofs, the solution time would be quite low. Once I had reasonable solution times, I was planning to scale the code to a larger number of degrees of freedom.

Ah, I see -- so the question you're really asking is why it takes 493 seconds to solve a problem with 215,000 unknowns. That's likely because you do 19 outer iterations, and in each you call a direct solver to decompose the same matrix 19 times.


Prof. Bangerth, would there be a way to do this when using SparseDirectMUMPS? From reading the documentation, I only see a solve() function. The alternative would be to use sparseILU. Do you recommend using sparseILU instead?

I don't recall the exact interface of SparseDirectMUMPS from past releases. SparseDirectUMFPACK allows you to compute a decomposition only once, and then apply it repeatedly in vmult(). The interface of SparseDirectMUMPS that's in the current developer version also allows you to do that. If you can switch to the current developer version (or check whether 9.7 can do that as well), you may want to try that.

SparseILU works sometimes, but it typically does not scale well to large problems. (Sparse direct solvers often do not either, but at least they don't require endless fiddling with settings.)


Since there is clearly an inefficiency when using DirectInverseMatrix objects as my preconditioner, I switched to using InverseMatrix as my inner solver, wherein I have CG with AMG preconditioner as shown in the code below:
[...]
I don't seem to observe any form of improvement in performance. From my observations the second CG solve with the (1,1) block takes around 70 iterations to converge, which adds to the bulk of the computation time. I would most likely have to add some performance improvements here for precAs, which might bring down the iteration counts and speed up things. Do you think this would be the right way to approach this problem?

I can't see where you use the AMG preconditioners, but the same applies: You should only set them up once and then re-use many times. That is, the preconditioners need to live *outside* the place where you solve the inner (1,1) block.

Perhaps as a general rule, people spend whole PhD theses to develop good parallel solvers and preconditioners. In industry, consultants are paid thousands or tens of thousands of dollars to figure out good solvers and preconditioners. You should expect that figuring this out is a long learning process that involves developing the skills to set up block preconditioners in the right place, and to find ways to timing the right places. This is not going to be an easy process; there is also not going to be a magic bullet the good people on this mailing list have that will magically make it work for you.

Best
 W.

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/dealii/a133023a-cb1a-4afb-9e9f-cbd0022b2092%40colostate.edu.

Reply via email to