I am a bit surprised that your assembly is really 100x more expensive than your linear solver. Maybe your assembly code is not optimized? For example, I try to avoid doing as much work as possible during the double loop over DOFs which is the inner most loop. Sometimes pre-calculating things in the outer loop really speeds-up the calculation. This also depends on the polynomials you are using for interpolation. If you are using high-order polynomials, I think this is where you will reap the benefits of Matrix free significantly.
Feel free to ask more questions :)! Best Bruno On Thursday, December 8, 2022 at 11:10:51 a.m. UTC-5 [email protected] wrote: > Hej, > > I've written here before and hope I don't misuse the mailing list - > however I've been looking a bit in the documentation and here and haven't > really found a conclusive answer. > > I am aiming to solve a nonlinear hyperbolic transport equation. For the > sake of the argument, let's say it reads > > mu \cdot \nabla f(x) = - f(x)^2 - 2*b(x)*f(x) - a(x) > > this is, of course, a Riccati equation (up to signs, possibly). In my > case, f is a complex function but this is of little relevance here. Since > it's a nonlinear problem I need to construct both the Jacobian and the > residual. For starters, I do that in each step. > > I've managed to implement this and even get a PETSc-parallelised version > to work, and am very happy. (I love deal.ii, by the way - very impressive). > It scales not "optimally" on my small laptop but it's still a fine speedup > when using MPI. So far so good. > > However, I want to solve my problem for many different directions vF, and > then extract all the solutions and do something with them. As such, my > problem is less that I need very large number of DOFs / huge meshes - my > typical mesh will be on the order of 10000 unknowns, maybe 100k but not > millions. Rather, I want the individual solves to be as fast as possible > since I need to do on the order of 100-10000 of them, depending on the > problem at hand. > > I've done some layman's benchmarking of the individual "steps" (setup, > assembly, solve, ...) in my current version of the code. It looks as if the > assembly takes several orders of magnitude (~100 at least) longer than the > solving part. > > My question is now: What is the best strategy to speed up assembly, is > there any experience with this? I've read different approaches and am > confused what's promising for small-scale problems. So far I'm considering: > > 1) Using a matrix-free approach rather than PETSc - this seems to be a win > in most cases, would however consider rewriting large parts of the code > and I am not sure if I will gain a lot given my small system size. > > 2) Only assemble the Jacobian every few steps, but the residual in every > step. This is probably easier to implement. I know from experience with my > problem that I pretty quickly land in a situation where I need only one or > two Newton steps to find the solution to my nonlinear equation, so there > saving will be small at best. > > Is there anything else one can do? > So far I've been using MeshWorker, which is fine and understandable to be, > but e.g. the boundary term as used in Example 12 queries the scalar product > of \mu and the edge normal in each boundary element, which seems like a > possible slowdown - in addition to generating jumps and averages on inner > cell edges. > > Any help is much appreciated. Sorry for the long text! > /Kev > -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/4977f9c9-45e8-4fc0-aed4-e6c9bd1783cfn%40googlegroups.com.
