Hello Prof. Wolfgang,
"I don't recall the exact interface of SparseDirectMUMPS from past
releases. SparseDirectUMFPACK allows you to compute a decomposition only
once, and then apply it repeatedly in vmult(). The interface of
SparseDirectMUMPS that's in the current developer version also allows
you to do that. If you can switch to the current developer version (or
check whether 9.7 can do that as well), you may want to try that. "
I used SparseDirectMUMPS as you suggested, i.e. initialize it in the
constructor, and use it repeatedly in the solve. I believe, it applies the
factored form each time vmult is called.
" I can't see where you use the AMG preconditioners, but the same applies:
You should only set them up once and then re-use many times. That is,
the preconditioners need to live *outside* the place where you solve the
inner (1,1) block. "
I have added the code snippet initializing and running the CG+AMG
preconditioner below. As it happens, the inner (1,1) block arises out of
the symmetric discretization of a Stokes system (as in Step-22), and using
BoomerAMG as is does not allow for the recognition of multiple DoFs
associated with each node, while this is an option available in using
BoomerAMG through PETSc ( *-pc_hypre_boomeramg_nodal_coarsen* ), but is not
exposed through the dealii interface. The solutions I can think of are:
1. adding the option to the PETScWrappers source code and testing it.
I'm not sure if it has been tried before with the PETSc Wrappers, and
whether there is a certain re-ordering that is to be kept in mind to
execute this successfully.
2. Switch to using Trilinos wrappers. However, my research group has
more background on using PETSc, and the switch would come at a loss of
support available from a linear algebra library POV.
3. Modifying the formulation to go from a symmetric gradient weak-form
to a vector laplacian weak-form. This comes with modeling justifications
and would be problem specific.
For now, I would prefer option 1, but would like to know if it has been
tried before, and if any caveats are associated with using it.
The code for using CG+AMG as a preconditioner
{
using invCG = LinearSolver::InverseMatrix< LA::MPI::SparseMatrix,
LA::MPI
::PreconditionAMG>;
LA::MPI::PreconditionAMG precAd;
{
LA::MPI::PreconditionAMG::AdditionalData data;
#ifdef USE_PETSC_LA
data.symmetric_operator = true;
data.max_iter = 4;
data.output_details = true;
#endif
precAd.initialize(A.block(0,0), data);
}
LA::MPI::PreconditionAMG precAs;
{
LA::MPI::PreconditionAMG::AdditionalData data;
#ifdef USE_PETSC_LA
data.symmetric_operator = true;
data.max_iter = 4;
data.output_details=true;
#endif
precAs.initialize(A.block(1,1), data);
}
LA::MPI::PreconditionAMG precMp;
{
LA::MPI::PreconditionAMG::AdditionalData data;
#ifdef USE_PETSC_LA
data.symmetric_operator = true;
#endif
precMp.initialize(Mp.block(2,2), data);
}
const invCG inv_Ad( A.block(0,0), precAd );
const invCG inv_As( A.block(1,1), precAs );
const invCG inv_Mp( Mp.block(2,2), precMp );
const LinearSolver::BlockDiagonalPreconditioner<invCG, invCG, invCG
>
prec_M( inv_Ad, inv_As, inv_Mp );
with the inverse matrix class defined as
template<class Matrix, class Preconditioner>
class InverseMatrix : public Subscriptor
{
public:
InverseMatrix( const Matrix &K,
const Preconditioner &M )
: K(&K), M(M),
pcout( std::cout,
(Utilities::MPI::this_mpi_process(MPI_COMM_WORLD)
== 0)) {}
template<typename VectorType>
void vmult( VectorType &dst,
const VectorType &src) const;
private:
const SmartPointer<const Matrix> K;
const Preconditioner &M;
ConditionalOStream pcout;
};
template<class Matrix, class Preconditioner>
template<typename VectorType>
void InverseMatrix<Matrix, Preconditioner> :: vmult(
VectorType &dst,
const VectorType &src ) const
{
SolverControl solver_control(src.size(), 1e-8 * src.l2_norm());
solver_control.log_history(true);
SolverCG<VectorType> cg(solver_control);
dst = 0;
try{
cg.solve(*K, dst, src, M);
pcout<<"CG solved in "<<solver_control.last_step() <<"
iterations" << std::endl;
} catch (std::exception &e) {
Assert(false, ExcMessage(e.what()));
}
}
Thank you for your help once again.
Regards,
Nihar
On Tuesday, 11 November 2025 at 10:58:33 UTC-8 Wolfgang Bangerth wrote:
>
> > My idea was to observe timing behaviour similar to step-55, as the
> > preconditioner template I have followed most closely emulates the
> > behaviour of the implementation in step-55. The number of degrees of
> > freedom are not significantly different from those observed at cycles 4
> > and 5. The output I observe from cycles 4 and 5 when i run step-55 using
> > 2 processes with the same version of deal.II, I get the performance as
> > shown below. This indicated to me that for 2 mpi processes and the
> > number of dofs, the solution time would be quite low. Once I had
> > reasonable solution times, I was planning to scale the code to a larger
> > number of degrees of freedom.
>
> Ah, I see -- so the question you're really asking is why it takes 493
> seconds to solve a problem with 215,000 unknowns. That's likely because
> you do 19 outer iterations, and in each you call a direct solver to
> decompose the same matrix 19 times.
>
>
> > Prof. Bangerth, would there be a way to do this when using
> > SparseDirectMUMPS? From reading the documentation, I only see a solve()
> > function. The alternative would be to use sparseILU. Do you recommend
> > using sparseILU instead?
>
> I don't recall the exact interface of SparseDirectMUMPS from past
> releases. SparseDirectUMFPACK allows you to compute a decomposition only
> once, and then apply it repeatedly in vmult(). The interface of
> SparseDirectMUMPS that's in the current developer version also allows
> you to do that. If you can switch to the current developer version (or
> check whether 9.7 can do that as well), you may want to try that.
>
> SparseILU works sometimes, but it typically does not scale well to large
> problems. (Sparse direct solvers often do not either, but at least they
> don't require endless fiddling with settings.)
>
>
> > Since there is clearly an inefficiency when using DirectInverseMatrix
> > objects as my preconditioner, I switched to using InverseMatrix as my
> > inner solver, wherein I have CG with AMG preconditioner as shown in the
> > code below:
> > [...]
> > I don't seem to observe any form of improvement in performance. From my
> > observations the second CG solve with the (1,1) block takes around 70
> > iterations to converge, which adds to the bulk of the computation time.
> > I would most likely have to add some performance improvements here for
> > precAs, which might bring down the iteration counts and speed up things.
> > Do you think this would be the right way to approach this problem?
>
> I can't see where you use the AMG preconditioners, but the same applies:
> You should only set them up once and then re-use many times. That is,
> the preconditioners need to live *outside* the place where you solve the
> inner (1,1) block.
>
> Perhaps as a general rule, people spend whole PhD theses to develop good
> parallel solvers and preconditioners. In industry, consultants are paid
> thousands or tens of thousands of dollars to figure out good solvers and
> preconditioners. You should expect that figuring this out is a long
> learning process that involves developing the skills to set up block
> preconditioners in the right place, and to find ways to timing the right
> places. This is not going to be an easy process; there is also not going
> to be a magic bullet the good people on this mailing list have that will
> magically make it work for you.
>
> Best
> W.
>
--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see
https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/dealii/cab96f2d-8737-4093-bf29-d8977ddc6722n%40googlegroups.com.