On Sun, May 18, 2008 at 10:11:31PM +0200, Murtazo Nazarov wrote: > Sorry, I forgot to put an attachment: > > > ... > >> > But we also need to remember that > >> > > >> > 1. Solve time may dominate assemble anyway so that's where we should > optimize. > >> > > >> > 2. Assembling the action instead of the operator removes the A.add() > bottleneck. > >> > > >> > As mentioned before, we are experimenting with iterating locally over > cells sharing common dofs and combining batches of element tensors > before inserting into the global sparse matrix row by row. Let's see > how it goes. > >> > > >> > Some other comments: > >> > > >> > Murtazo: It's interesting that femLego is that much faster. It would > be interesting to see exactly why it is faster. Can you take a simple > example (maybe even Poisson) and profile both femLego and DOLFIN on > the same mesh and get detailed results for tabulate_tensor, > >> > tabulate_dofs, A.add(). If femLego is faster on A.add(), then what > linear algebra backend is it using? > >> > >> Yes, the test we did it is simple 2D Poisson with unite square mesh and an > >> assembly in FemLego is 3 time faster, because A.add() is done in a way > I wrote in previous mails. The linear algebra package is AZTEC. > Perhaps, dolfin should be much faster than femLego if A.add() is the > same, since FFC is very fast than quadrature rule. > > > > I thought AZTEC was just solvers. I mean what sparse matrix format is > used. And what does the interface look like for communicating the exact > position for insertion? > > The sparse matrix is just double* atw; Attached I send you the subroutine > which does this A.add(): idxatw(el,li,lj) is index of global matrix for > cell = ell, row = li, col = lj. > > > > > >> > Murtazo: It seems you suggest we should basically store some kind of > index/pointer for where to write directly to memory when adding the > entries. This has two problems: (i) we need to store quite a few of > these indices (n^2*num_cells where n is the local space dimension), > and (ii) it requires cooperation from the linear algebra backend. We > would need to ask PETSc to tell us where in its private data > structures it inserted the entries. > >> > >> Yes, maybe there is a better way to do it. If we store the global indices > >> of the A it will be totally A.nz()*numVertices*num_components, but > still we will have a speedup which is more important in some case. > > > > That doesn't look correct to me. I think we would need n^2*num_cells. We > iterate over the cells and for each cell we have n^2 entries and we need > to know where to insert each one of those. > > > > I have contributed and have experience with femLego (i did my exjob with > that). This way works well in parallel also. The problem may not be > problem for a very large mesh, since anyway one need to use parallel > processors. > > murtazo
It looks to me like the storage needed is indeed n^2*num_cells. I'm not fluent in Fortran, but that's how I interpret this line: atw(idxatw(el,li,lj)) = atw(idxatw(el,li,lj)) + Atmp(li,lj) This looks expensive (in terms of memory), but maybe not that expensive? -- Anders _______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
