On Tue, Jul 22, 2008 at 3:30 AM, Garth N. Wells <[EMAIL PROTECTED]> wrote: > > > Matthew Knepley wrote: >> >> On Mon, Jul 21, 2008 at 4:48 PM, Anders Logg <[EMAIL PROTECTED]> wrote: >>> >>> On Mon, Jul 21, 2008 at 04:37:28PM -0500, Matthew Knepley wrote: >>>> >>>> On Mon, Jul 21, 2008 at 4:35 PM, Anders Logg <[EMAIL PROTECTED]> wrote: >>>>> >>>>> On Mon, Jul 21, 2008 at 04:03:11PM -0500, Matthew Knepley wrote: >>>>>> >>>>>> On Mon, Jul 21, 2008 at 3:55 PM, Matthew Knepley <[EMAIL PROTECTED]> >>>>>> wrote: >>>>>>> >>>>>>> On Mon, Jul 21, 2008 at 3:50 PM, Garth N. Wells <[EMAIL PROTECTED]> >>>>>>> wrote: >>>>>>>> >>>>>>>> Anders Logg wrote: >>>>>>>>> >>>>>>>>> On Mon, Jul 21, 2008 at 01:48:23PM +0100, Garth N. Wells wrote: >>>>>>>>>> >>>>>>>>>> Anders Logg wrote: >>>>>>>>>>> >>>>>>>>>>> I have updated the assembly benchmark to include also MTL4, see >>>>>>>>>>> >>>>>>>>>>> bench/fem/assembly/ >>>>>>>>>>> >>>>>>>>>>> Here are the current results: >>>>>>>>>>> >>>>>>>>>>> Assembly benchmark | Elasticity3D PoissonP1 PoissonP2 >>>>>>>>>>> PoissonP3 THStokes2D NSEMomentum3D StabStokes2D >>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------------------------------------------------------------------- >>>>>>>>>>> uBLAS | 9.0789 0.45645 3.8042 >>>>>>>>>>> 8.0736 14.937 9.2507 3.8455 >>>>>>>>>>> PETSc | 7.7758 0.42798 3.5483 >>>>>>>>>>> 7.3898 13.945 8.1632 3.258 >>>>>>>>>>> Epetra | 8.9516 0.45448 3.7976 >>>>>>>>>>> 8.0679 15.404 9.2341 3.8332 >>>>>>>>>>> MTL4 | 8.9729 0.45554 3.7966 >>>>>>>>>>> 8.0759 14.94 9.2568 3.8658 >>>>>>>>>>> Assembly | 7.474 0.43673 3.7341 >>>>>>>>>>> 8.3793 14.633 7.6695 3.3878 >>>>>>>>>>> >>>>>>>> >>>>>>>> I specified in MTL4Matrix maximum 30 nonzeroes per row, and the >>>>>>>> results >>>>>>>> change quite a bit, >>>>>>>> >>>>>>>> Assembly benchmark | Elasticity3D PoissonP1 PoissonP2 >>>>>>>> PoissonP3 >>>>>>>> THStokes2D NSEMomentum3D StabStokes2D >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------------------------------------- >>>>>>>> uBLAS | 7.1881 0.32748 2.7633 >>>>>>>> 5.8311 >>>>>>>> 10.968 7.0735 2.8184 >>>>>>>> PETSc | 5.7868 0.30673 2.5489 >>>>>>>> 5.2344 >>>>>>>> 9.8896 6.069 2.3661 >>>>>>>> MTL4 | 2.8641 0.18339 1.6628 >>>>>>>> 2.6811 >>>>>>>> 2.8519 3.4843 0.85029 >>>>>>>> Assembly | 5.5564 0.30896 2.6858 >>>>>>>> 5.9675 >>>>>>>> 10.622 5.7144 2.4519 >>>>>>>> >>>>>>>> >>>>>>>> MTL4 is a lot faster in all cases. >>>>>> >>>>>> Okay, if you run KSP ex2 (Poisson 2D) and add a logging stage that >>>>>> times assembly (I checked it in to petsc-dev) >>>>>> then 1M unknowns takes about 1s >>>>>> >>>>>> Matrix Object: >>>>>> type=seqaij, rows=1000000, cols=1000000 >>>>>> total: nonzeros=4996000, allocated nonzeros=5000000 >>>>>> not using I-node routines >>>>>> Summary of Stages: ----- Time ------ ----- Flops ----- --- >>>>>> Messages --- -- Message Lengths -- -- Reductions -- >>>>>> Avg %Total Avg %Total counts >>>>>> %Total Avg %Total counts %Total >>>>>> 0: Main Stage: 1.4997e+00 56.3% 3.8891e+08 100.0% 0.000e+00 >>>>>> 0.0% 0.000e+00 0.0% 2.200e+01 51.2% >>>>>> 1: Assembly: 1.1648e+00 43.7% 0.0000e+00 0.0% 0.000e+00 >>>>>> 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >>>>>> >>>>>> I just cut the solve off. Thus all thos enumber are extemely fishy. >>>>>> >>>>>> Matt >>>>> >>>>> We shouldn't trust those numbers just yet. Some of it may be Python >>>>> overhead (calling the FFC JIT compiler etc). >>>>> >>>>> Does 1M unknowns mean a unit square divided into 2x1000x1000 right >>>>> triangles? >>>> >>>> Its FD Poisson, which gives the same sparsity and values as P1 Poisson, >>>> so >>>> its a 1000x1000 quadrilateral grid. This was just to time insertion. >>>> >>>> Matt >>> >>> But this is a different problem. Since you know the sparsity pattern a >>> priori, you may be able to (i) not compute the sparsity pattern, (ii) >> >> No, we only allocate correctly here. >> > > Matt, > > Is there much of a performance difference with MatSeqAIJSetPreallocation > between setting the maximum number of non-zeroes per row (PetscInt nz), and > setting the number of non-zeroes for each row (PetscInt nnz[]) when the > number of non-zeroes per row doesn't differ greatly?
There should be no difference at all. Matt > Garth > > >>> compute the entries more efficiently, (iii) not compute the >>> local-to-global mapping, and (iv) insert the entries more efficiently. >> >> Insertion is the same and we compute the same mapping we always use. >> I think you guys overcompute for the l2g. >> >> Matt >> >>> Our timings include all these steps + Python overhead. I'm going to >>> rewrite it in C++ so we can eliminate that source of uncertainty. >>> >>> -- >>> Anders >>> >>> -----BEGIN PGP SIGNATURE----- >>> Version: GnuPG v1.4.6 (GNU/Linux) >>> >>> iD8DBQFIhQQgTuwUCDsYZdERAnUzAJ93hfI/Psx6IccOdOr3GhbODAdFgACdFAj9 >>> Mc0MiBbB+aiTEMXOajyrnog= >>> =oLL0 >>> -----END PGP SIGNATURE----- >>> >>> _______________________________________________ >>> DOLFIN-dev mailing list >>> [email protected] >>> http://www.fenics.org/mailman/listinfo/dolfin-dev >>> >>> >> >> >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener _______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
