Pete,

Bruno, I assumed that the thread with 100% CPU usage was somehow feeding the
others in step-8,

It's more like for some functions, we split operations onto as many threads as there are CPUs. But then, the next function you call may not be parallelized, and so everything only works on one thread. On average, that one thread has a load of 100% whereas the others have a lesser load.


I just tested the step-8 program with PreconditionIdentity() witch showed 100%
CPU usage on all 8 CPUs. The results follow. Assuming having no preconditioner
only slows things down maybe getting 3 times the CPU power will make it up. I
haven't checked solve times yet. The preconditioner for step-8 was
PreconditionSSOR<> with relaxation parameter = 1.2. Is there an optimum
preconditioner/relaxation parameter for 3d elasticity problems that you know
of? Is their determination only by the trial and error?

1.2 seems to be what a lot of people use.

As for thread use: if you use PreconditionIdentity, *all* major operations that CG calls are parallelized. On the other hand, using PreconditionSSOR, you will be spend at least 50% of your time in the preconditioner, but SSOR is a sequential method where you need to compute the update for one vector element before you can more to the next. So it cannot be parallelized, and consequently your average thread load will be less than 100%.

Neither of these are good preconditioners in the big scheme of things, if you envision going to large problems. For those, you ought to use variations of the multigrid method.


Wolfgang, What I meant by efficiency was the CPU usage in the threads for
Step-17 NEW and OLD decreased with the larger #DOFs or cycle #'s.

If the load decreased for both codes, I would attribute this to memory traffic. If the problem is small enough, much of it will fit into the caches of the processor/cores, and so you get high throughput. If the problem becomes bigger, processors wait for data for longer. Waiting is, IIRC, still counted as processor load, but it may make some operations that are not parallelized take longer than those that are parallelized, and so overall lead to a lower average thread load.

But that's only a theory that would require a lot more digging to verify.

Best
 W.


--
------------------------------------------------------------------------
Wolfgang Bangerth               email:            bange...@colostate.edu
                                www: http://www.math.tamu.edu/~bangerth/

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to