On Sun 2008-05-18 21:54, Johan Hoffman wrote: > > On Sat, May 17, 2008 at 04:40:48PM +0200, Johan Hoffman wrote: > > > > 1. Solve time may dominate assemble anyway so that's where we should > > optimize. > > Yes, there may be such cases, in particular for simple forms (Laplace > equation etc.). For more complex forms with more terms and coefficients, > assembly typically dominates, from what I have seen. This is the case for > the flow problems of Murtazo for example.
This probably depends if you use are using a projection method. If you are solving the saddle point problem, you can forget about assembly time. But optimizing the solve is all about constructing a good preconditioner. If the operator is elliptic then AMG should work well and you don't have to think, but if it is indefinite all bets are off. I think we can build saddle point preconditioners now by writing some funny-looking mixed form files, but that could be made easier. > > 2. Assembling the action instead of the operator removes the A.add() > > bottleneck. > > True. But it may be worthwhile to put some effort into optimizing also the > matrix assembly. In any case, you have to form something to precondition with. > > As mentioned before, we are experimenting with iterating locally over > > cells sharing common dofs and combining batches of element tensors > > before inserting into the global sparse matrix row by row. Let's see > > how it goes. > > Yes, this is interesting. Would be very interesting to hear about the > progress. > > It is also interesting to understand what would optimize the insertion for > different linear algebra backends, in particular Jed seems to have a good > knowledge on petsc. We could then build backend optimimization into the > local dof-orderings etc. I just press M-. when I'm curious :-) I can't imagine it pays to optimize for a particular backend (it's not PETSc anyway, rather whichever format is used by the preconditioner). The CSR data structure is pretty common, but it will always be fastest to insert an entire row at once. If using an intermediate hashed structure makes this convenient, then it would help. The paper I posted assembles the entire matrix in hashed format and then converts it to CSR. I'll guess that a hashed cache for the assembly (flushed every few MiB, for instance) would work at least as well as assembling the entire thing in hashed format. Jed
pgpWLAMd0rR6C.pgp
Description: PGP signature
_______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
