On Tue, Jul 22, 2008 at 11:41:26PM +0100, Garth N. Wells wrote: > > > DOLFIN wrote: > > One or more new changesets pushed to the primary dolfin repository. > > A short summary of the last three changesets is included below. > > > > changeset: 4491:cb0fdfa3514ab65e67b2f922cf482b4f2aa008eb > > tag: tip > > user: Anders Logg <[EMAIL PROTECTED]> > > date: Tue Jul 22 23:44:14 2008 +0200 > > files: bench/fem/assembly/cpp/main.cpp > > description: > > Bug fix in assembly benchmark (don't trust values in previous changeset) > > and add reassembly benchmark. New preliminary results: > > > > Assemble | Poisson2DP1 Poisson2DP2 Poisson2DP3 THStokes2D > > StabStokes2D Elasticity3D NSEMomentum3D > > --------------------------------------------------------------------------------------------------------- > > uBLAS | 0.45 3.84 3.77 15.1 > > 3.81 8.8 9.13 > > PETSc | 0.42 3.6 3.56 14.07 > > 3.2 7.6 7.9 > > Epetra | 0.45 3.76 3.76 14.94 > > 3.72 8.71 9.06 > > MTL4 | 0.44 3.75 3.75 14.77 > > 3.73 8.75 9.11 > > Assembly | 0.43 3.78 3.8 14.88 > > 3.36 7.05 7.49 > > > > Reassemble | Poisson2DP1 Poisson2DP2 Poisson2DP3 THStokes2D > > StabStokes2D Elasticity3D NSEMomentum3D > > ----------------------------------------------------------------------------------------------------------- > > uBLAS | 0.2 0.64 0.64 4.37 > > 1.49 4.39 4.74 > > PETSc | 0.19 0.54 0.55 3.08 > > 1.06 3.24 3.55 > > Epetra | 0.2 0.65 0.65 4.41 > > 1.5 4.36 4.71 > > MTL4 | 0.22 0.65 0.64 4.42 > > 1.5 4.38 4.73 > > Assembly | 0.17 0.53 0.53 2.92 > > 0.89 2.36 2.73 > > > > From these results, it looks like the AssemblyMatrix backend is the fastest > > but there may be bugs etc. > > > > I'm getting quite different results.
Strange that we are getting so different results. Did you use any particular compiler options? > Assemble | Poisson2DP1 Poisson2DP2 Poisson2DP3 THStokes2D > StabStokes2D Elasticity3D NSEMomentum3D > > --------------------------------------------------------------------------------------------------------- > uBLAS | 0.34 2.94 2.88 11.49 > 2.86 6.67 6.98 > PETSc | 0.31 2.7 2.71 10.24 > 2.44 5.72 5.94 > Epetra | 0.35 2.41 2.39 7.22 > 2.14 10.88 10.98 > MTL4 | 0.2 1.78 1.79 2.97 > 0.83 1.99 2.32 > Assembly | 0.31 2.85 2.89 11.26 > 2.57 5.46 5.77 > > Reassemble | Poisson2DP1 Poisson2DP2 Poisson2DP3 THStokes2D > StabStokes2D Elasticity3D NSEMomentum3D > > ----------------------------------------------------------------------------------------------------------- > uBLAS | 0.14 0.47 0.47 3.22 > 1.1 3.23 3.46 > PETSc | 0.16 0.42 0.42 2.37 > 0.81 2.46 2.68 > Epetra | 0.17 0.43 0.43 2.28 > 0.82 2.2 2.51 > MTL4 | 0.18 0.49 0.48 1.55 > 0.85 1.73 1.9 > Assembly | 0.12 0.43 0.42 2.32 > 0.68 1.82 2.1 > > MTL4 is the fastest, which is due in large part to the fact the > MTL4SparsityPattern doesn't do anything. Once we get the sparsity > pattern sorted out, I expect PETSc to be very close to MTL4. > > I don't think that AssemblyMatrix is particularly interesting other than > for curiosity because it's not good for linear algebra. I still think it's interesting since it allows us to experiment with new special-purpose backends for assembly, followed by a conversion to one of the other formats (like for uBLAS before). The current implementation as vector<map<uint, real>> is just an example. > For Stokes + Taylor-Hood, most of the time is in the generation of the > sparsity pattern. I tested PETSc and MTL4 earlier today for Taylor-Hood > by not generating the vector-of-a-set in SparsityPattern and just > prescribing the maximum number of non-zeroes per row. The assembly was > much faster, and the difference between PETSc and MTL4 was very small. > > I also made a modification of SparsityPattern to work with a 'homemade' > unsorted set using a vector of vectors. It's a lot faster than using > std::set in SparsityPattern and can return the number of non-zeroes per > row. However, it isn't ordered for each row so it's not very useful for > initialising sparse uBLAS matrices by filling the matrix in order. What > I'll do is implement it, so PETSc and MTL4, and probably Epetra, will be > considerably faster. For uBLAS, I'll revert to the old strategy of > assembling into a fast to assemble matrix and converting that to > compressed row when apply() is called. ok, sounds good. Once we've settled on the assembly benchmark, I'd like to run it on DOLFIN 0.8.0 for reference, so if you modify what we have now we need to do some backporting. -- Anders
signature.asc
Description: Digital signature
_______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
