Wolfgang Bangerth wrote:
>> How well do different assembly techniques scale? I have 3 different
>> assembly routines in my code:
>> 1. Serial version
>> 2. Parallel version based on the old deal.ii threading mechanism
>> 3. Parallel version based on the new deal.ii threading mechanism with
>> TBB. I've done some, admittedly rather crude, timing tests on my dual
>> core PC and see no difference whatsoever.
> 
> You don't say what exactly you were timing: the entire program, or just the 
> assembly routine? Wall clock time or CPU time?
> 
> In our experiments, using the TBB definitely helps, but it doesn't provide 
> perfect scaling. Using threads scales better if the system is empty and if 
> all threads do exactly the same, but the TBB can schedule work.

I've done some more thorough timing tests and here is what I got.
The four numbers shown are
CPU time / wall clock time / wall clock time spent in solve+update
routines (that in %) / wall clock time spent in assembly routines (that
in %).
All timings performed using the deal timing functions on an 8-core linux
machine. Time is in seconds. The solver used is UMFPACK.
Assembly using TBB:
744 / 133 / 114.38 (86%) / 15.96 (12%)
Serial assembly:
750 / 225 / 114.75 (51%) / 108 (48%)
Assembly using the "old style" threading:
854 / 157 / 114.61 (73%) / 39.25 (25%)

As we can see, both TBB and "old style" threading provide a clear
speed-up, but TBB is the winner with assembly speed-up of 6.75! "old
style" threading's speed-up is only 2.75, but I have a lot of mutexes
there so this is not a surprise.
Another observation is that both serial and "old style" threading
versions launch the same number of threads (8), which I could see
through htop. The TBB version runs about twice as many threads as the
number of cores available. I am not sure what the serial version does
with this many threads.
All this was done on a fixed grid, i.e. without refinement. Refinement
adds a lot of overhead.
In conclusion, I must say that I am actually quite happy about how
parallel assembly works. Now I just have to find a way to speed up the
solver - are there any parallel (not MPI based) sparse solvers? Speeding
up refinement would also help...

Cheers,
Victor Prosolin.

begin:vcard
fn:Victor Prosolin
n:Prosolin;Victor
org:University of Calgary;Department of Physics and Astronomy
email;internet:[email protected]
title:Graduate Student
tel;work:(403) 220-6340
x-mozilla-html:FALSE
version:2.1
end:vcard

_______________________________________________
dealii mailing list http://poisson.dealii.org/mailman/listinfo/dealii

Reply via email to