My idea was to observe timing behaviour similar to step-55, as the
preconditioner template I have followed most closely emulates the
behaviour of the implementation in step-55. The number of degrees of
freedom are not significantly different from those observed at cycles 4
and 5. The output I observe from cycles 4 and 5 when i run step-55 using
2 processes with the same version of deal.II, I get the performance as
shown below. This indicated to me that for 2 mpi processes and the
number of dofs, the solution time would be quite low. Once I had
reasonable solution times, I was planning to scale the code to a larger
number of degrees of freedom.
Ah, I see -- so the question you're really asking is why it takes 493
seconds to solve a problem with 215,000 unknowns. That's likely because
you do 19 outer iterations, and in each you call a direct solver to
decompose the same matrix 19 times.
Prof. Bangerth, would there be a way to do this when using
SparseDirectMUMPS? From reading the documentation, I only see a solve()
function. The alternative would be to use sparseILU. Do you recommend
using sparseILU instead?
I don't recall the exact interface of SparseDirectMUMPS from past
releases. SparseDirectUMFPACK allows you to compute a decomposition only
once, and then apply it repeatedly in vmult(). The interface of
SparseDirectMUMPS that's in the current developer version also allows
you to do that. If you can switch to the current developer version (or
check whether 9.7 can do that as well), you may want to try that.
SparseILU works sometimes, but it typically does not scale well to large
problems. (Sparse direct solvers often do not either, but at least they
don't require endless fiddling with settings.)
Since there is clearly an inefficiency when using DirectInverseMatrix
objects as my preconditioner, I switched to using InverseMatrix as my
inner solver, wherein I have CG with AMG preconditioner as shown in the
code below:
[...]
I don't seem to observe any form of improvement in performance. From my
observations the second CG solve with the (1,1) block takes around 70
iterations to converge, which adds to the bulk of the computation time.
I would most likely have to add some performance improvements here for
precAs, which might bring down the iteration counts and speed up things.
Do you think this would be the right way to approach this problem?
I can't see where you use the AMG preconditioners, but the same applies:
You should only set them up once and then re-use many times. That is,
the preconditioners need to live *outside* the place where you solve the
inner (1,1) block.
Perhaps as a general rule, people spend whole PhD theses to develop good
parallel solvers and preconditioners. In industry, consultants are paid
thousands or tens of thousands of dollars to figure out good solvers and
preconditioners. You should expect that figuring this out is a long
learning process that involves developing the skills to set up block
preconditioners in the right place, and to find ways to timing the right
places. This is not going to be an easy process; there is also not going
to be a magic bullet the good people on this mailing list have that will
magically make it work for you.
Best
W.
--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see
https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/dealii/a133023a-cb1a-4afb-9e9f-cbd0022b2092%40colostate.edu.