Pete,
Bruno, I assumed that the thread with 100% CPU usage was somehow feeding the
others in step-8,
It's more like for some functions, we split operations onto as many threads as
there are CPUs. But then, the next function you call may not be parallelized,
and so everything only works on one thread. On average, that one thread has a
load of 100% whereas the others have a lesser load.
I just tested the step-8 program with PreconditionIdentity() witch showed 100%
CPU usage on all 8 CPUs. The results follow. Assuming having no preconditioner
only slows things down maybe getting 3 times the CPU power will make it up. I
haven't checked solve times yet. The preconditioner for step-8 was
PreconditionSSOR<> with relaxation parameter = 1.2. Is there an optimum
preconditioner/relaxation parameter for 3d elasticity problems that you know
of? Is their determination only by the trial and error?
1.2 seems to be what a lot of people use.
As for thread use: if you use PreconditionIdentity, *all* major operations
that CG calls are parallelized. On the other hand, using PreconditionSSOR, you
will be spend at least 50% of your time in the preconditioner, but SSOR is a
sequential method where you need to compute the update for one vector element
before you can more to the next. So it cannot be parallelized, and
consequently your average thread load will be less than 100%.
Neither of these are good preconditioners in the big scheme of things, if you
envision going to large problems. For those, you ought to use variations of
the multigrid method.
Wolfgang, What I meant by efficiency was the CPU usage in the threads for
Step-17 NEW and OLD decreased with the larger #DOFs or cycle #'s.
If the load decreased for both codes, I would attribute this to memory
traffic. If the problem is small enough, much of it will fit into the caches
of the processor/cores, and so you get high throughput. If the problem becomes
bigger, processors wait for data for longer. Waiting is, IIRC, still counted
as processor load, but it may make some operations that are not parallelized
take longer than those that are parallelized, and so overall lead to a lower
average thread load.
But that's only a theory that would require a lot more digging to verify.
Best
W.
--
------------------------------------------------------------------------
Wolfgang Bangerth email: bange...@colostate.edu
www: http://www.math.tamu.edu/~bangerth/
--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see
https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.