Very nice! Some comments:
1. Beating uBLAS by a factor 3 is not that hard. Didem Unat (PhD student at UCSD/Simula) and Ilmar have been looking at the assembly in DOLFIN recently. We've done some initial benchmarks and have started investigating how to speedup the assembly. Take a look at what happens when we assemble into uBLAS: (i) Compute sparsity pattern (ii) Reset tensor (iii) Assemble For uBLAS, each of these steps is approximately an assembly process. I don't remember the exact numbers, but by just using an std::vector<std::map<int, double> > instead of a uBLAS matrix, one may skip (i) and (ii) and get a speedup. We've just started and don't have anything to present yet. 2. I've also looked at MTL before. We even considered using it as the main LA backend a (long) while back. 3. With the new LA interfaces in place, I wouldn't mind having MTL as an optional backend. -- Anders On Tue, Jul 15, 2008 at 11:58:05PM +0200, Dag Lindbo wrote: > Hello! > > In light of the long and interesting discussion we had a while ago about > assembler performance I decided to try to squeeze more out of the uBlas > backend. This was not very successful. > > However, I've been following the development of MTL4 > (http://www.osl.iu.edu/research/mtl/mtl4/) with a keen eye on the > interesting insertion scheme they provide. I implemented a backend -- > without sparsity pattern computation -- for the dolfin assembler and here > are some first benchmarks results: > > Incomp Navier Stokes on 50x50x50 unit cube > > MTL -------------------------------------------------------- > assembly time: 8.510000 > reassembly time: 6.750000 > vecor assembly time: 6.070000 > > memory: 230 mb > > UBLAS ------------------------------------------------------ > assembly time: 23.030000 > reassembly time: 12.140000 > vector assembly time: 6.030000 > > memory: 642 mb > > Poisson on 2000x2000 unit square > > MTL -------------------------------------------------------- > assembly time: 9.520000 > reassembly time: 6.650000 > assembly time: 4.730000 > vector linear solve: 0.000000 > > memory: 452 mb > > UBLAS ------------------------------------------------------ > assembly time: 15.400000 > reassembly time: 7.520000 > vector assembly time: 5.020000 > > memory: 1169 mb > > Conclusions? MTL is more than twice as fast and allocates less than half > the memory (since there is no sparsity pattern computation) across a set > of forms I've tested. > > The code is not perfectly done yet, but I'd still be happy to share it > with whoever wants to mess around with it. > > Cheers! > > /Dag > > _______________________________________________ > DOLFIN-dev mailing list > [email protected] > http://www.fenics.org/mailman/listinfo/dolfin-dev
signature.asc
Description: Digital signature
_______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
