> Very nice! > Thanks :)
> Some comments: > > 1. Beating uBLAS by a factor 3 is not that hard. Didem Unat (PhD > student at UCSD/Simula) and Ilmar have been looking at the assembly in > DOLFIN recently. We've done some initial benchmarks and have started > investigating how to speedup the assembly. Take a look at what happens > when we assemble into uBLAS: > > (i) Compute sparsity pattern > (ii) Reset tensor > (iii) Assemble > > For uBLAS, each of these steps is approximately an assembly process. > I don't remember the exact numbers, but by just using an > std::vector<std::map<int, double> > instead of a uBLAS matrix, one may > skip (i) and (ii) and get a speedup. > I think this simplifies it too much. Ublas has a matrix type called "generalized vector of vector" (gvov) that one can assemble into without (i) and (ii) but then one has to copy the whole matrix to a compressed row major afterwards. Artefacts of this pre-SP "assembly matrix" can still be found in DOLFIN. That MTL can take you from a fresh bilinear form to a matrix ripe for Krylov iteration three times faster is, in my opinion, impressive. > We've just started and don't have anything to present yet. > > 2. I've also looked at MTL before. We even considered using it as the > main LA backend a (long) while back. > > 3. With the new LA interfaces in place, I wouldn't mind having MTL as > an optional backend. > > -- > Anders > > > On Tue, Jul 15, 2008 at 11:58:05PM +0200, Dag Lindbo wrote: >> Hello! >> >> In light of the long and interesting discussion we had a while ago about >> assembler performance I decided to try to squeeze more out of the uBlas >> backend. This was not very successful. >> >> However, I've been following the development of MTL4 >> (http://www.osl.iu.edu/research/mtl/mtl4/) with a keen eye on the >> interesting insertion scheme they provide. I implemented a backend -- >> without sparsity pattern computation -- for the dolfin assembler and >> here >> are some first benchmarks results: >> >> Incomp Navier Stokes on 50x50x50 unit cube >> >> MTL -------------------------------------------------------- >> assembly time: 8.510000 >> reassembly time: 6.750000 >> vecor assembly time: 6.070000 >> >> memory: 230 mb >> >> UBLAS ------------------------------------------------------ >> assembly time: 23.030000 >> reassembly time: 12.140000 >> vector assembly time: 6.030000 >> >> memory: 642 mb >> >> Poisson on 2000x2000 unit square >> >> MTL -------------------------------------------------------- >> assembly time: 9.520000 >> reassembly time: 6.650000 >> assembly time: 4.730000 >> vector linear solve: 0.000000 >> >> memory: 452 mb >> >> UBLAS ------------------------------------------------------ >> assembly time: 15.400000 >> reassembly time: 7.520000 >> vector assembly time: 5.020000 >> >> memory: 1169 mb >> >> Conclusions? MTL is more than twice as fast and allocates less than half >> the memory (since there is no sparsity pattern computation) across a set >> of forms I've tested. >> >> The code is not perfectly done yet, but I'd still be happy to share it >> with whoever wants to mess around with it. >> >> Cheers! >> >> /Dag >> >> _______________________________________________ >> DOLFIN-dev mailing list >> [email protected] >> http://www.fenics.org/mailman/listinfo/dolfin-dev > _______________________________________________ > DOLFIN-dev mailing list > [email protected] > http://www.fenics.org/mailman/listinfo/dolfin-dev > _______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
