Dear Kev,
To give an additional perspective on this topic: The assembly functions
of deal.II (when using FEValues and the approach shown in step-1 to
step-30 at least) are not optimized for performance also when compiled
in release mode, but rather for readability. On an average case with
linear or quadratic polynomials, I would expect a factor of 10 in
performance gain by using one of the following approaches:
- You could express the three nested loop of a typical matrix assembly
routine as a dense matrix-matrix multiplication. This requires you to
rearrange some computations, especially when you work with systems of
equations, but the gain comes from the fact that matrix-matrix
multiplication is typically mapped to BLAS functions, which are one of
the most optimized algorithms in the world (as LINPACK uses it as the
dominating algorithm). The step-65 tutorial program explains this for a
symmetric problem, but it is not hard to do for non-symmetric problems.
- You could express the code that computes the local matrix such that
FEEvaluation (from the matrix-free framework) can compute the system
matrix by applying the local operator to all unit vectors. You could
also go completely matrix-free in the sense of not computing a matrix at
all, but using more optimized strategies to compute the local matrices
and residuals could be a middle ground when you don't want to fiddle
with iterative solvers beyond PETSc's (great) default choices. The
matrix-free framework uses computer hardware slightly less efficiently
than BLAS-3, but it can leverage the structure in shape functions and
hence lower complexity, often quite significantly.
Obviously, both the above options require some work, and lead to
not-so-nice code. It depends on your use case if that is worth it. Of
course, all the other suggestions, like not assembling the matrix in
every step (possibly via SUNDIALS) or gathering precise timings to take
the right action, should also be in your list of considerations.
Best,
Martin
On 09.12.22 22:07, Marc Fehling wrote:
Hello Kev,
just to make sure: did you compile the deal.II library and your code
in Optimized mode/Release mode
<https://www.dealii.org/current/readme.html#configuration>?
Marc
On Thursday, December 8, 2022 at 2:35:22 PM UTC-7 Wolfgang Bangerth wrote:
Kev,
> I've done some layman's benchmarking of the individual "steps"
(setup,
> assembly, solve, ...) in my current version of the code. It
looks as if the
> assembly takes several orders of magnitude (~100 at least)
longer than the
> solving part.
>
> My question is now: What is the best strategy to speed up
assembly, is there
> any experience with this? I've read different approaches and am
confused
> what's promising for small-scale problems. So far I'm considering:
>
> 1) Using a matrix-free approach rather than PETSc - this seems
to be a win in
> most cases, would however consider rewriting large parts of the
code and I am
> not sure if I will gain a lot given my small system size.
>
> 2) Only assemble the Jacobian every few steps, but the residual
in every step.
> This is probably easier to implement. I know from experience
with my problem
> that I pretty quickly land in a situation where I need only one
or two Newton
> steps to find the solution to my nonlinear equation, so there
saving will be
> small at best.
Like Bruno, it seems rather unlikely to me that assembly would
take 100x times
longer than other things, and I would suggest you use something like
TimerOutput to time individual sections to narrow down where the
issue lies.
Beyond this, only updating the Jacobian is indeed a fairly usual
strategy. One
can of course implement this by hand, but it's quite cumbersome to
implement
optimal algorithms to determine when updating is necessary, and I
would
encourage you to take a look at step-77 to see how one can solve
nonlinear
problems efficiently through interfaces to advanced libraries such
as SUNDIALS:
https://dealii.org/developer/doxygen/deal.II/step_77.html
I don't remember if KINSOL has ways to solve a sequence of
nonlinear system,
re-using the Jacobian between systems. But if it does not, you can
always just
store a pointer to the last Jacobian matrix and whenever you start
solving a
linear system for the first time, copy the stored matrix over.
In any case, I think the first step should be to look at (i)
whether assembly
really takes that long for you, (ii) understand why it takes that
long. If
step-77 is any indication, then assembling the Jacobian should
really not take
that much longer than actually solving linear systems.
Best
W.
--
------------------------------------------------------------------------
Wolfgang Bangerth email: [email protected]
www: http://www.math.colostate.edu/~bangerth/
--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see
https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google
Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/dealii/5bb618cf-52ce-4871-944b-aeb56d1b4c50n%40googlegroups.com
<https://groups.google.com/d/msgid/dealii/5bb618cf-52ce-4871-944b-aeb56d1b4c50n%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see
https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/dealii/cd83f257-098a-dfeb-840b-945f2bd7c65e%40gmail.com.