Wolfgang Bangerth wrote: >> How well do different assembly techniques scale? I have 3 different >> assembly routines in my code: >> 1. Serial version >> 2. Parallel version based on the old deal.ii threading mechanism >> 3. Parallel version based on the new deal.ii threading mechanism with >> TBB. I've done some, admittedly rather crude, timing tests on my dual >> core PC and see no difference whatsoever. > > You don't say what exactly you were timing: the entire program, or just the > assembly routine? Wall clock time or CPU time? > > In our experiments, using the TBB definitely helps, but it doesn't provide > perfect scaling. Using threads scales better if the system is empty and if > all threads do exactly the same, but the TBB can schedule work.
I've done some more thorough timing tests and here is what I got. The four numbers shown are CPU time / wall clock time / wall clock time spent in solve+update routines (that in %) / wall clock time spent in assembly routines (that in %). All timings performed using the deal timing functions on an 8-core linux machine. Time is in seconds. The solver used is UMFPACK. Assembly using TBB: 744 / 133 / 114.38 (86%) / 15.96 (12%) Serial assembly: 750 / 225 / 114.75 (51%) / 108 (48%) Assembly using the "old style" threading: 854 / 157 / 114.61 (73%) / 39.25 (25%) As we can see, both TBB and "old style" threading provide a clear speed-up, but TBB is the winner with assembly speed-up of 6.75! "old style" threading's speed-up is only 2.75, but I have a lot of mutexes there so this is not a surprise. Another observation is that both serial and "old style" threading versions launch the same number of threads (8), which I could see through htop. The TBB version runs about twice as many threads as the number of cores available. I am not sure what the serial version does with this many threads. All this was done on a fixed grid, i.e. without refinement. Refinement adds a lot of overhead. In conclusion, I must say that I am actually quite happy about how parallel assembly works. Now I just have to find a way to speed up the solver - are there any parallel (not MPI based) sparse solvers? Speeding up refinement would also help... Cheers, Victor Prosolin.
begin:vcard fn:Victor Prosolin n:Prosolin;Victor org:University of Calgary;Department of Physics and Astronomy email;internet:[email protected] title:Graduate Student tel;work:(403) 220-6340 x-mozilla-html:FALSE version:2.1 end:vcard
_______________________________________________ dealii mailing list http://poisson.dealii.org/mailman/listinfo/dealii
