On Wed, Nov 10, 2010 at 9:47 AM, Anders Logg <[email protected]> wrote: > On Wed, Nov 10, 2010 at 02:47:30PM +0000, Garth N. Wells wrote: >> Nice to see multi-thread assembly being added. We should look at >> adding support for the multi-threaded version of SuperLU. What other >> multi-thread solvers are out there? > > Yes, that would be good, but I don't know which solvers are available.
SuperLU tends to die on large problems. Mumps is a much better option. > >> I haven't looked at the code in great detail, but are element >> tensors being added to the global tensor is a thread-safe fashion? >> Both PETSc and Trilinos are not thread-safe. > > Yes, they should. That's the main point. It's a very simple algorithm > which just partitions the matrix row by row and makes each process > responsible for a chunk of rows. During assembly, all processes > iterate over the entire mesh and on each cell does one of three things: > > 1. all_in_range: tabulate_tensor as usual and add > 2. none_in_range: skip tabulate_tensor (continue) > 3. some_in_range: tabulate_tensor and insert only rows in range > > Didem Unat (PhD student at UCLA/Simula) tried this in a simple > prototype code and got very good speedups (up to a factor 7 on an > eight-core machine) so it's just a matter of doing the same thing as > part of DOLFIN (which is a bit trickier since some of the data access > is hidden). The current implementation in DOLFIN seems to work and > give some small speedup but I need to do some more testing. > >> Rather than having two assembly classes, would it be worth using >> OpenMP instead? I experimented with OpenMP some time ago, but never >> added it since at the time it required a very recent version of gcc. >> This shouldn't be a problem now. > > I don't think this would work with OpenMP since we need to control how > the rows are inserted. > > If this works out and we get good speedups, we could consider > replacing Assembler by MulticoreAssembler. It's not that much extra > code and it's pretty clean. I haven't tried yet, but it should also > work in combination with MPI (each node has a part of the mesh and > does multi-core assembly). > > -- > Anders > > _______________________________________________ > Mailing list: https://launchpad.net/~dolfin > Post to : [email protected] > Unsubscribe : https://launchpad.net/~dolfin > More help : https://help.launchpad.net/ListHelp > _______________________________________________ Mailing list: https://launchpad.net/~dolfin Post to : [email protected] Unsubscribe : https://launchpad.net/~dolfin More help : https://help.launchpad.net/ListHelp

