Dag Lindbo wrote: > Robert Kirby wrote: >> Good news, but remember there is a tradeoff here --MTL is designed by some >> of the best generic programming folks around, and I am not surprised that >> these results are great. Remember first they are probably only supporting >> serial computing. PETSc has to keep track of a bunch of stuff that enables >> parallel computing but may affect serial performance. > > True. PETSc provides bindings to some very fancy external packages as > well (BoomerAMG etc), which is a huge benefit. > > My intentions with an MTL backend are: > *) Explore the efficiency of the dolfin assembler and see how far we can > push it > *) Possibly compete with the uBLAS in the serial backend domain >
With the new linear algebra design, this should be pretty easy to add. >> Also, there are issues besides simple assembly matrix-vector product. Many >> algebraic preconditioners need different kinds of queries that may be >> optimized to different degrees in different packages (e.g. extract the >> diagonal). While assembly and matvec are probably the two most crucial >> benchmarks, it would be interesting to design a more robust set of >> benchmarks that would test some of these other features as well. Not >> working primarily in preconditioners, I'm not sure what should go in, but it >> should be bigger rather than smaller (e.g. doing SOR, SSOR, ILU, pivoting >> for LU, etc) > > MTL4 sits nicely with a related project: ITL (from the same group), > which provides the basic Krylov methods and the preconditioners you > mention. I have yet to benchmark these, but I suspect they will be > pretty solid for serial performance. Bindings to at least one high-end > LU solver will have to be provided as is the case today for uBLAS (UMFPACK). > Is it possible to access plain pointers to the underlying CSR matrix? If so, it's easy to bolt-on serial preconditioners and LU solvers. >> It is my observation that beating PETSc or Trilinos at simple things is not >> that hard, but there is a lot of expert knowledge built into these systems >> over many years that adds robustness and safety and at least decent >> performance across a wide range of operations. Newer packages targeting a >> specific research idea (e.g. template metaprogramming) rather than servicing >> the scientific computing world may or may not have this extra robustness >> built in yet. > > MTL4 is cutting edge in its own right. Whether it is mature enough, I > can't tell for sure. It feels very solid to work with though. > I looked at what I recall being MTL2 before around the time the uBLAS backend was implemented. The issue at the time was typical for research projects: maintenance, continuity and completeness. Now that we have some solid linear algebra backends (PETSc and uBLAS) and we've cleaned up the interface, when can afford to experiment with additional backends. Did you use DOLFIN::Assembler + MTL, or did you write another assembler for testing? The memory use looks strange to me. Did you perform an LU solver for the uBLAS case? The sparsity pattern doesn't use much memory (just integers). Can you check the memory when using PETSc? Garth > Thanks for your input! > /Dag > >> Rob >> >> On Wed, Jul 16, 2008 at 9:47 AM, <[EMAIL PROTECTED]> wrote: >> >>> Sounds amazing! >>> >>> I'd like to see that code although I can not promise you to >>> much response during my holiday, which is starting tomorrow. >>> >>> Have you compared matrix vector product with vector products using uBlas >>> or PETSc ? >>> >>> Kent >>> >>> >>>> Hello! >>>> >>>> In light of the long and interesting discussion we had a while ago about >>>> assembler performance I decided to try to squeeze more out of the uBlas >>>> backend. This was not very successful. >>>> >>>> However, I've been following the development of MTL4 >>>> (http://www.osl.iu.edu/research/mtl/mtl4/) with a keen eye on the >>>> interesting insertion scheme they provide. I implemented a backend -- >>>> without sparsity pattern computation -- for the dolfin assembler and here >>>> are some first benchmarks results: >>>> >>>> Incomp Navier Stokes on 50x50x50 unit cube >>>> >>>> MTL -------------------------------------------------------- >>>> assembly time: 8.510000 >>>> reassembly time: 6.750000 >>>> vecor assembly time: 6.070000 >>>> >>>> memory: 230 mb >>>> >>>> UBLAS ------------------------------------------------------ >>>> assembly time: 23.030000 >>>> reassembly time: 12.140000 >>>> vector assembly time: 6.030000 >>>> >>>> memory: 642 mb >>>> >>>> Poisson on 2000x2000 unit square >>>> >>>> MTL -------------------------------------------------------- >>>> assembly time: 9.520000 >>>> reassembly time: 6.650000 >>>> assembly time: 4.730000 >>>> vector linear solve: 0.000000 >>>> >>>> memory: 452 mb >>>> >>>> UBLAS ------------------------------------------------------ >>>> assembly time: 15.400000 >>>> reassembly time: 7.520000 >>>> vector assembly time: 5.020000 >>>> >>>> memory: 1169 mb >>>> >>>> Conclusions? MTL is more than twice as fast and allocates less than half >>>> the memory (since there is no sparsity pattern computation) across a set >>>> of forms I've tested. >>>> >>>> The code is not perfectly done yet, but I'd still be happy to share it >>>> with whoever wants to mess around with it. >>>> >>>> Cheers! >>>> >>>> /Dag >>>> >>>> _______________________________________________ >>>> DOLFIN-dev mailing list >>>> [email protected] >>>> http://www.fenics.org/mailman/listinfo/dolfin-dev >>>> >>> _______________________________________________ >>> DOLFIN-dev mailing list >>> [email protected] >>> http://www.fenics.org/mailman/listinfo/dolfin-dev >>> > > > ------------------------------------------------------------------------ > > _______________________________________________ > DOLFIN-dev mailing list > [email protected] > http://www.fenics.org/mailman/listinfo/dolfin-dev _______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
