Re: [DOLFIN-dev] Results: Parallel speedup

Garth N. Wells Tue, 22 Sep 2009 00:03:06 -0700


Anders Logg wrote:
> On Tue, Sep 22, 2009 at 08:11:27AM +0200, Niclas Jansson wrote:
>> Matthew Knepley <[email protected]> writes:
>>
>>> On Mon, Sep 21, 2009 at 2:37 PM, Anders Logg <[email protected]> wrote:
>>>
>>>     Johan and I have set up a benchmark for parallel speedup in
>>>
>>>      bench/fem/speedup
>>>
>>>     Here are some preliminary results:
>>>
>>>      Speedup  |  Assemble  Assemble + solve
>>>      --------------------------------------
>>>      1        |         1                 1
>>>      2        |    1.4351            4.0785
>>>      4        |    2.3763            6.9076
>>>      8        |    3.7458            9.4648
>>>      16       |    6.3143            19.369
>>>      32       |    7.6207            33.699
>>>
>>> These numbers are very very strange for a number of reasons:
>>>
>>> 1) Assemble should scale almost perfectly. Something is wrong here.
>>>
>>> 2) Solve should scale like a matvec, which should not be this good,
>>>     especially on a cluster with a slow network. I would expect 85% or so.
>>>
>>> 3) If any of these are dual core, then it really does not make sense since
>>>     it should be bandwidth limited.
>>>
>>>   Matt
>>>  
>> So true, these numbers are very strange. I usually get 6-7 times speedup
>> for the icns solver in unicorn on a crappy intel bus based 2 x quad core.
>>
>> A quick look at the code, is the mesh only 64 x 64? This could (does) explain
>> the poor assembly performance on 32 processes (^-^)
> 
> It's 64 x 64 x 64 (3D). What would be a reasonable size?
> 
>> Also, I think the timing is done in the wrong way. Without barriers, it
>> would never measure the true parallel runtime.
>>
>> MPI_Barrier
>> MPI_Wtime
>> number crunching
>> MPI_Barrier
>> MPI_Wtime
>>
>> (Well assemble is more or less an implicit barrier due to apply(), but I
>> don't think solvers has some kind of implicit barriers)
> 
> I thought there were implicit barriers in both assemble (apply) and
> the solver, but adding barriers would not hurt.
>


The assembly timing should be split into the 'assembly over cells' phase 
and the time to call apply().

Garth

> --
> Anders
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> DOLFIN-dev mailing list
> [email protected]
> http://www.fenics.org/mailman/listinfo/dolfin-dev


_______________________________________________
DOLFIN-dev mailing list
[email protected]
http://www.fenics.org/mailman/listinfo/dolfin-dev

Re: [DOLFIN-dev] Results: Parallel speedup

Reply via email to