On Wed, Nov 10, 2010 at 9:55 AM, Garth N. Wells <[email protected]> wrote: > > > On 10/11/10 15:53, Andy Ray Terrel wrote: >> >> On Wed, Nov 10, 2010 at 9:47 AM, Anders Logg<[email protected]> wrote: >>> >>> On Wed, Nov 10, 2010 at 02:47:30PM +0000, Garth N. Wells wrote: >>>> >>>> Nice to see multi-thread assembly being added. We should look at >>>> adding support for the multi-threaded version of SuperLU. What other >>>> multi-thread solvers are out there? >>> >>> Yes, that would be good, but I don't know which solvers are available. >> >> SuperLU tends to die on large problems. Mumps is a much better option. >> > > MUMPS is MPI-based. SuperLU has a multi-threaded version for shared memory > machines. > > Garth
Yes but you compile it to take advantage of MPI's shared memory message passing. > >>> >>>> I haven't looked at the code in great detail, but are element >>>> tensors being added to the global tensor is a thread-safe fashion? >>>> Both PETSc and Trilinos are not thread-safe. >>> >>> Yes, they should. That's the main point. It's a very simple algorithm >>> which just partitions the matrix row by row and makes each process >>> responsible for a chunk of rows. During assembly, all processes >>> iterate over the entire mesh and on each cell does one of three things: >>> >>> 1. all_in_range: tabulate_tensor as usual and add >>> 2. none_in_range: skip tabulate_tensor (continue) >>> 3. some_in_range: tabulate_tensor and insert only rows in range >>> >>> Didem Unat (PhD student at UCLA/Simula) tried this in a simple >>> prototype code and got very good speedups (up to a factor 7 on an >>> eight-core machine) so it's just a matter of doing the same thing as >>> part of DOLFIN (which is a bit trickier since some of the data access >>> is hidden). The current implementation in DOLFIN seems to work and >>> give some small speedup but I need to do some more testing. >>> >>>> Rather than having two assembly classes, would it be worth using >>>> OpenMP instead? I experimented with OpenMP some time ago, but never >>>> added it since at the time it required a very recent version of gcc. >>>> This shouldn't be a problem now. >>> >>> I don't think this would work with OpenMP since we need to control how >>> the rows are inserted. >>> >>> If this works out and we get good speedups, we could consider >>> replacing Assembler by MulticoreAssembler. It's not that much extra >>> code and it's pretty clean. I haven't tried yet, but it should also >>> work in combination with MPI (each node has a part of the mesh and >>> does multi-core assembly). >>> >>> -- >>> Anders >>> >>> _______________________________________________ >>> Mailing list: https://launchpad.net/~dolfin >>> Post to : [email protected] >>> Unsubscribe : https://launchpad.net/~dolfin >>> More help : https://help.launchpad.net/ListHelp >>> > _______________________________________________ Mailing list: https://launchpad.net/~dolfin Post to : [email protected] Unsubscribe : https://launchpad.net/~dolfin More help : https://help.launchpad.net/ListHelp

