I appreciate that they are tutorials, but given that I need a 
projection-step Navier-Stokes solver and that I lack the FEM background I 
thought I'd take that code.

Size of the problem: my main modification is reading in my own mesh. So I 
can have way more variables.

ILU: I'm ashamed to admit that that obvious fact slipped by me. Ok, I've 
switched to a Block Jacobi. That does increase the thread activity. Yay.
However, the pressure solve does not converge. Do I understand that 
BlockJacobi uses full inverses of the blocks? That ought to be pretty good.

"This costs some additional memory - for DG methods about 1/3 (for double 
inverses) or 1/6 (for float inverses) of that used for the matrix - but it 
makes the preconditioning much faster."

I don't get that. Is DG so different from regular FEM? The diagonal block 
of a FEM matrix is itself a FEM matrix on a subdomain, so the inverse (even 
if you mean that you "invert" by doing a full LU) takes a ton more storage.

My problem doesn't get awfully large, and I have lots of memory. Should I 
use a direct solver for both pressure & velocity solve? Can I assume that 
UMFpack is suitably multi-threaded?

Your thoughts as always appreciated.

V.


On Wednesday, May 13, 2026 at 3:11:01 PM UTC-5 Wolfgang Bangerth wrote:

> On 5/10/26 08:22, Victor Eijkhout wrote:
> > Fair comment. I'm running your step35 tutorials with minimal 
> modifications.
> > 
> > Is there documentation on how dealII does parallelism? I've come across 
> > mention of OMP_NUM_THREADS as a limit, and clearly it is using hwloc to 
> > discover hardware parallelism, and reading the source I see 
> > OMP_DEAL_II_THREADS, but I'm not sure how the whole caboodle fits 
> together.
>
> Victor,
> most of the parallelism would have to happen in the application program 
> (step-35 in your case). The key loop of that program looks like this:
>
> loop:
> interpolate_velocity();
> diffusion_step(...);
> projection_step(...);
> update_pressure(...);
>
> The second and third of these do expensive things like assembling linear 
> systems and solving them. For assembling the linear system, the program 
> uses 
> WorkStream, which we know scales reasonably well to perhaps a dozen cores 
> (or 
> maybe two dozen, depending on what the workload actually is). For solving 
> linear systems, there isn't much parallelism to be had: The matrices are 
> very 
> small (31k and 4k rows) so matrix-vector products do not scale well to 
> more 
> than at most a handful of cores, and the preconditioners used (ILU) has no 
> parallelism at all -- not because it isn't implemented, but because one 
> can't 
> parallelize forward/backward substitution at all.
>
> So I'm not surprised you don't get much speed-up. You'd need (i) a much 
> larger 
> program to make those operations that are parallelized work well, and (ii) 
> use 
> different algorithms than the ones used in this program to solve linear 
> problems. Both of these are of course possible, it's just not what this 
> program does. Nor what its intent is: the tutorials are meant to *teach* 
> how 
> to write finite element codes; they're not *intended* to be HPC-ready 
> applications.
>
> Best
> W.
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/dealii/ef189fbb-fcb8-4359-bdd7-0e68c11a704en%40googlegroups.com.

Reply via email to