Re: [DOLFIN-dev] Assembly benchmark

Garth N. Wells Tue, 22 Jul 2008 01:47:13 -0700


Matthew Knepley wrote:
> On Mon, Jul 21, 2008 at 4:48 PM, Anders Logg <[EMAIL PROTECTED]> wrote:
>> On Mon, Jul 21, 2008 at 04:37:28PM -0500, Matthew Knepley wrote:
>>> On Mon, Jul 21, 2008 at 4:35 PM, Anders Logg <[EMAIL PROTECTED]> wrote:
>>>> On Mon, Jul 21, 2008 at 04:03:11PM -0500, Matthew Knepley wrote:
>>>>> On Mon, Jul 21, 2008 at 3:55 PM, Matthew Knepley <[EMAIL PROTECTED]> 
>>>>> wrote:
>>>>>> On Mon, Jul 21, 2008 at 3:50 PM, Garth N. Wells <[EMAIL PROTECTED]> 
>>>>>> wrote:
>>>>>>>
>>>>>>> Anders Logg wrote:
>>>>>>>> On Mon, Jul 21, 2008 at 01:48:23PM +0100, Garth N. Wells wrote:
>>>>>>>>> Anders Logg wrote:
>>>>>>>>>> I have updated the assembly benchmark to include also MTL4, see
>>>>>>>>>>
>>>>>>>>>>    bench/fem/assembly/
>>>>>>>>>>
>>>>>>>>>> Here are the current results:
>>>>>>>>>>
>>>>>>>>>> Assembly benchmark  |  Elasticity3D  PoissonP1  PoissonP2  PoissonP3 
>>>>>>>>>>  THStokes2D  NSEMomentum3D  StabStokes2D
>>>>>>>>>> -------------------------------------------------------------------------------------------------------------
>>>>>>>>>> uBLAS               |        9.0789    0.45645     3.8042     8.0736 
>>>>>>>>>>  14.937         9.2507        3.8455
>>>>>>>>>> PETSc               |        7.7758    0.42798     3.5483     7.3898 
>>>>>>>>>>  13.945         8.1632         3.258
>>>>>>>>>> Epetra              |        8.9516    0.45448     3.7976     8.0679 
>>>>>>>>>>  15.404         9.2341        3.8332
>>>>>>>>>> MTL4                |        8.9729    0.45554     3.7966     8.0759 
>>>>>>>>>>  14.94          9.2568        3.8658
>>>>>>>>>> Assembly            |         7.474    0.43673     3.7341     8.3793 
>>>>>>>>>>  14.633         7.6695        3.3878
>>>>>>>>>>
>>>>>>>
>>>>>>> I specified in MTL4Matrix maximum 30 nonzeroes per row, and the results
>>>>>>> change quite a bit,
>>>>>>>
>>>>>>>  Assembly benchmark  |  Elasticity3D  PoissonP1  PoissonP2  PoissonP3
>>>>>>> THStokes2D  NSEMomentum3D  StabStokes2D
>>>>>>>
>>>>>>> -------------------------------------------------------------------------------------------------------------
>>>>>>>  uBLAS               |        7.1881    0.32748     2.7633     5.8311
>>>>>>>     10.968         7.0735        2.8184
>>>>>>>  PETSc               |        5.7868    0.30673     2.5489     5.2344
>>>>>>>     9.8896          6.069        2.3661
>>>>>>>  MTL4                |        2.8641    0.18339     1.6628     2.6811
>>>>>>>     2.8519         3.4843       0.85029
>>>>>>>  Assembly            |        5.5564    0.30896     2.6858     5.9675
>>>>>>>     10.622         5.7144        2.4519
>>>>>>>
>>>>>>>
>>>>>>> MTL4 is a lot faster in all cases.
>>>>> Okay, if you run KSP ex2 (Poisson 2D) and add a logging stage that
>>>>> times assembly (I checked it in to petsc-dev)
>>>>> then 1M unknowns takes about 1s
>>>>>
>>>>>   Matrix Object:
>>>>>     type=seqaij, rows=1000000, cols=1000000
>>>>>     total: nonzeros=4996000, allocated nonzeros=5000000
>>>>>       not using I-node routines
>>>>> Summary of Stages:   ----- Time ------  ----- Flops -----  ---
>>>>> Messages ---  -- Message Lengths --  -- Reductions --
>>>>>                         Avg     %Total     Avg     %Total   counts
>>>>> %Total     Avg         %Total   counts   %Total
>>>>>  0:      Main Stage: 1.4997e+00  56.3%  3.8891e+08 100.0%  0.000e+00
>>>>> 0.0%  0.000e+00        0.0%  2.200e+01  51.2%
>>>>>  1:        Assembly: 1.1648e+00  43.7%  0.0000e+00   0.0%  0.000e+00
>>>>> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>>>>>
>>>>> I just cut the solve off. Thus all thos enumber are extemely fishy.
>>>>>
>>>>>   Matt
>>>> We shouldn't trust those numbers just yet. Some of it may be Python
>>>> overhead (calling the FFC JIT compiler etc).
>>>>
>>>> Does 1M unknowns mean a unit square divided into 2x1000x1000 right
>>>> triangles?
>>> Its FD Poisson, which gives the same sparsity and values as P1 Poisson, so
>>> its a 1000x1000 quadrilateral grid. This was just to time insertion.
>>>
>>>   Matt
>> But this is a different problem. Since you know the sparsity pattern a
>> priori, you may be able to (i) not compute the sparsity pattern, (ii)
> 
> No, we only allocate correctly here.
>


Matt,

Is there much of a performance difference with MatSeqAIJSetPreallocation 
between setting the maximum number of non-zeroes per row (PetscInt nz), 
and setting the number of non-zeroes for each row (PetscInt nnz[]) when 
the number of non-zeroes per row doesn't differ greatly?

Garth


>> compute the entries more efficiently, (iii) not compute the
>> local-to-global mapping, and (iv) insert the entries more efficiently.
> 
> Insertion is the same and we compute the same mapping we always use.
> I think you guys overcompute for the l2g.
> 
>   Matt
> 
>> Our timings include all these steps + Python overhead. I'm going to
>> rewrite it in C++ so we can eliminate that source of uncertainty.
>>
>> --
>> Anders
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.6 (GNU/Linux)
>>
>> iD8DBQFIhQQgTuwUCDsYZdERAnUzAJ93hfI/Psx6IccOdOr3GhbODAdFgACdFAj9
>> Mc0MiBbB+aiTEMXOajyrnog=
>> =oLL0
>> -----END PGP SIGNATURE-----
>>
>> _______________________________________________
>> DOLFIN-dev mailing list
>> [email protected]
>> http://www.fenics.org/mailman/listinfo/dolfin-dev
>>
>>
> 
> 
> 

_______________________________________________
DOLFIN-dev mailing list
[email protected]
http://www.fenics.org/mailman/listinfo/dolfin-dev

Re: [DOLFIN-dev] Assembly benchmark

Reply via email to