On Tue, Sep 22, 2009 at 08:11:27AM +0200, Niclas Jansson wrote: > Matthew Knepley <[email protected]> writes: > > > On Mon, Sep 21, 2009 at 2:37 PM, Anders Logg <[email protected]> wrote: > > > > Johan and I have set up a benchmark for parallel speedup in > > > > bench/fem/speedup > > > > Here are some preliminary results: > > > > Speedup | Assemble Assemble + solve > > -------------------------------------- > > 1 | 1 1 > > 2 | 1.4351 4.0785 > > 4 | 2.3763 6.9076 > > 8 | 3.7458 9.4648 > > 16 | 6.3143 19.369 > > 32 | 7.6207 33.699 > > > > These numbers are very very strange for a number of reasons: > > > > 1) Assemble should scale almost perfectly. Something is wrong here. > > > > 2) Solve should scale like a matvec, which should not be this good, > > especially on a cluster with a slow network. I would expect 85% or so. > > > > 3) If any of these are dual core, then it really does not make sense since > > it should be bandwidth limited. > > > > Matt > > > > So true, these numbers are very strange. I usually get 6-7 times speedup > for the icns solver in unicorn on a crappy intel bus based 2 x quad core. > > A quick look at the code, is the mesh only 64 x 64? This could (does) explain > the poor assembly performance on 32 processes (^-^)
It's 64 x 64 x 64 (3D). What would be a reasonable size? > Also, I think the timing is done in the wrong way. Without barriers, it > would never measure the true parallel runtime. > > MPI_Barrier > MPI_Wtime > number crunching > MPI_Barrier > MPI_Wtime > > (Well assemble is more or less an implicit barrier due to apply(), but I > don't think solvers has some kind of implicit barriers) I thought there were implicit barriers in both assemble (apply) and the solver, but adding barriers would not hurt. -- Anders
signature.asc
Description: Digital signature
_______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
