Hello everyone,
first of all, thank you for all the detailed and kind answers! I will try
to answer and probably miss a few points.
I knew about the debug and release modes and thought I was still in release
mode. I had apparently switched back to debug mode for benchmarking
purposes without remembering doing that. That is of course silly and a
large part of the discrepancy. Now the factor is down from 100-200 to
around 4-5, which is of course a great improvement that I should have found
myself. Oh well.
I'm still very appreciative for all the comments since I'm currently in
some form of early alpha of my programme and trying to get the basics
"right" so that the layer surrounding the FEM solution step can be build on
top. [In case that is of interest: I'm working on simulating
superconductors, in my case the FEM solution gets fed into some form of
self-consistency loop for an energy landscape/ the coefficients of the
equation that happens outside of a FEM method.]
As for tutorial 22, I hadn't read the part about assembly and gladly read
that. Tutorial 77 I had indeed thought about when asking about not
recalculating the Jacobian in each step, it's a very interesting thing to
add but I might postpone SUNDIALS to a later state. I have worked with it a
few years ago and remember it to be a beast to tame, so to speak, but
probably worth the effort.
Martin, thanks for all the background and suggestions regarding some
matrix-free / matrix-product options. I should definitely look into that
once I understand it better :-)
-----------------------------
Here the general answer stops, and a particular question follows - feel
feel to skip reading on.
-----------------------------
One "bottleneck" in my current matrix-based approaches is in the "interface
worker" that calculates fluxes between neighbouring cells. I have tried to
avoid "re-computations" and shift as much outside of the inner loops as
possible, which probably helped by a factor of 2 or so from my earliest
approaches. One thing particularly I am still confused about is the
following:
I want to calculate an upwind value at interior interfaces. In Example/
Step 12 tutorial, this is done via something of the form:
*fe_iv.shape_value((beta_dot_n > 0), j, qpoint) // phi_j^{upwind}*
which is very clear. Now, for my nonlinear problem I need to do something
like this with my "old" solution guess in the residual and Jacobian. Since
I have a an FEsystem with at least two components, for the jacobian I end
up using
*fe_iv[component_1].value((**beta_dot_n** > 0), j, qpoint)*
which is exactly the same thing, to my limited understanding. Now, for the
residual I need this term for my "last solution" vector, but I couldn't
find a corresponding function for that problem. What I did find was
*fe_iv[component_1].get_function_values((vF_dot_n > 0),
last_solution_ghost, tmp_array);*
which, however, does not operate point-wise but rather determines the value
vector for the entire cell(?), and then I have to query *tmp_array* at
given point *q_point*. Since I need to recalculate *beta_dot_n* at every
point, this means a lot of additional copy operations. I did not fully
understand if* get_function_values_from_local_dof_values*() is the way to
go? My current solution is to use a "workaround" of sorts, by
pre-calculating the averages and jumps outside the "three" loop over
qpoints and DOFs i and j, and inside the loop the use
*last_solution**_average*beta_dot_n + 0.5*abs(beta_dot_n)***last_solution*
*_jump*
which does the job but still requires the calculation of all these averages
and jumps rather than picking a certain value based on a simple if-else
query. Replacing this type of construct in the Jacobian (with the
above-mentioned function) saved some 10% or so, although this is arguably
more often looped over than for the residual.
-------------------- question ends------------
Thanks for any hints so far, and I'm appreciative for any further help - I
sincerely appreciate the top-notch documentation and apologize for my lack
of understanding all these fine details (and obvious mishaps such as the
release flag...).
Best,
Kev
On Sunday, December 11, 2022 at 10:24:27 PM UTC+1 Martin Kronbichler wrote:
> Dear Kev,
>
> To give an additional perspective on this topic: The assembly functions of
> deal.II (when using FEValues and the approach shown in step-1 to step-30 at
> least) are not optimized for performance also when compiled in release
> mode, but rather for readability. On an average case with linear or
> quadratic polynomials, I would expect a factor of 10 in performance gain by
> using one of the following approaches:
>
> - You could express the three nested loop of a typical matrix assembly
> routine as a dense matrix-matrix multiplication. This requires you to
> rearrange some computations, especially when you work with systems of
> equations, but the gain comes from the fact that matrix-matrix
> multiplication is typically mapped to BLAS functions, which are one of the
> most optimized algorithms in the world (as LINPACK uses it as the
> dominating algorithm). The step-65 tutorial program explains this for a
> symmetric problem, but it is not hard to do for non-symmetric problems.
>
> - You could express the code that computes the local matrix such that
> FEEvaluation (from the matrix-free framework) can compute the system matrix
> by applying the local operator to all unit vectors. You could also go
> completely matrix-free in the sense of not computing a matrix at all, but
> using more optimized strategies to compute the local matrices and residuals
> could be a middle ground when you don't want to fiddle with iterative
> solvers beyond PETSc's (great) default choices. The matrix-free framework
> uses computer hardware slightly less efficiently than BLAS-3, but it can
> leverage the structure in shape functions and hence lower complexity, often
> quite significantly.
>
> Obviously, both the above options require some work, and lead to
> not-so-nice code. It depends on your use case if that is worth it. Of
> course, all the other suggestions, like not assembling the matrix in every
> step (possibly via SUNDIALS) or gathering precise timings to take the right
> action, should also be in your list of considerations.
>
> Best,
> Martin
>
>
> On 09.12.22 22:07, Marc Fehling wrote:
>
> Hello Kev,
>
> just to make sure: did you compile the deal.II library and your code in
> Optimized
> mode/Release mode
> <https://www.dealii.org/current/readme.html#configuration>?
>
> Marc
>
> On Thursday, December 8, 2022 at 2:35:22 PM UTC-7 Wolfgang Bangerth wrote:
>
>>
>> Kev,
>>
>> > I've done some layman's benchmarking of the individual "steps" (setup,
>> > assembly, solve, ...) in my current version of the code. It looks as if
>> the
>> > assembly takes several orders of magnitude (~100 at least) longer than
>> the
>> > solving part.
>> >
>> > My question is now: What is the best strategy to speed up assembly, is
>> there
>> > any experience with this? I've read different approaches and am
>> confused
>> > what's promising for small-scale problems. So far I'm considering:
>> >
>> > 1) Using a matrix-free approach rather than PETSc - this seems to be a
>> win in
>> > most cases, would however consider rewriting large parts of the code
>> and I am
>> > not sure if I will gain a lot given my small system size.
>> >
>> > 2) Only assemble the Jacobian every few steps, but the residual in
>> every step.
>> > This is probably easier to implement. I know from experience with my
>> problem
>> > that I pretty quickly land in a situation where I need only one or two
>> Newton
>> > steps to find the solution to my nonlinear equation, so there saving
>> will be
>> > small at best.
>>
>> Like Bruno, it seems rather unlikely to me that assembly would take 100x
>> times
>> longer than other things, and I would suggest you use something like
>> TimerOutput to time individual sections to narrow down where the issue
>> lies.
>>
>> Beyond this, only updating the Jacobian is indeed a fairly usual
>> strategy. One
>> can of course implement this by hand, but it's quite cumbersome to
>> implement
>> optimal algorithms to determine when updating is necessary, and I would
>> encourage you to take a look at step-77 to see how one can solve
>> nonlinear
>> problems efficiently through interfaces to advanced libraries such as
>> SUNDIALS:
>> https://dealii.org/developer/doxygen/deal.II/step_77.html
>>
>> I don't remember if KINSOL has ways to solve a sequence of nonlinear
>> system,
>> re-using the Jacobian between systems. But if it does not, you can always
>> just
>> store a pointer to the last Jacobian matrix and whenever you start
>> solving a
>> linear system for the first time, copy the stored matrix over.
>>
>> In any case, I think the first step should be to look at (i) whether
>> assembly
>> really takes that long for you, (ii) understand why it takes that long.
>> If
>> step-77 is any indication, then assembling the Jacobian should really not
>> take
>> that much longer than actually solving linear systems.
>>
>> Best
>> W.
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Wolfgang Bangerth email: [email protected]
>> www: http://www.math.colostate.edu/~bangerth/
>>
>>
>> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see
> https://groups.google.com/d/forum/dealii?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dealii/5bb618cf-52ce-4871-944b-aeb56d1b4c50n%40googlegroups.com
>
> <https://groups.google.com/d/msgid/dealii/5bb618cf-52ce-4871-944b-aeb56d1b4c50n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see
https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/dealii/cfba430f-fa2c-475b-9dba-a95c8f6696den%40googlegroups.com.