On Fri, Dec 19, 2008 at 8:54 AM, Anders Logg <[email protected]> wrote: > On Fri, Dec 19, 2008 at 12:09:50PM +0900, Evan Lezar wrote: >> >> >> On Thu, Dec 18, 2008 at 4:25 AM, Johan Hake <[email protected]> wrote: >> >> On Wednesday 17 December 2008 20:19:52 Anders Logg wrote: >> > On Wed, Dec 17, 2008 at 08:13:03PM +0100, Johan Hake wrote: >> > > On Wednesday 17 December 2008 19:20:11 Anders Logg wrote: >> > > > Ola and I have now finished up the first round of getting DOLFIN to >> > > > run in parallel. In short, we can now parse meshes from file in >> > > > parallel and partition meshes in parallel (using ParMETIS). >> > > > >> > > > We reused some good ideas that Niclas Jansson had implemented in >> > > > PXMLMesh before, but have also made some significant changes as >> > > > follows: >> > > > >> > > > 1. The XML reader does not handle any partitioning. >> > > > >> > > > 2. The XML reader just reads in a chunk of the mesh data on each >> > > > processor (in parallel) and stores that into a LocalMeshData object >> > > > (one for each processor). The data is just partitioned in blocks so >> > > > the vertices and cells may be completely unrelated. >> > > > >> > > > 3. The partitioning takes place in MeshPartitioning::partition, >> > > > which gets a LocalMeshData object on each processor. It then calls >> > > > ParMETIS to compute a partition (in parallel) and then >> redistributes >> > > > the data accordingly. Finally, a mesh is built on each processor >> using >> > > > the local data. >> > > > >> > > > 4. All direct MPI calls (except one which should be removed) have >> been >> > > > removed from the code. Instead, we mostly rely on >> > > > dolfin::MPI::distribute which handles most cases of parallel >> > > > communication and works with STL data structures. >> > > > >> > > > 5. There is just one ParMETIS call (no initial geometric >> > > > partitioning). It seemed like an unnecessary step, or are there >> good >> > > > reasons to perform the partitioning in two steps? >> > > > >> > > > For testing, go to sandbox/passembly, build and then run >> > > > >> > > > mpirun -n 4 ./demo >> > > > ./plot_partitions 4 >> > > >> > > Looks beautiful! >> > > >> > > I threw a 3D mesh of 160K vertices onto it, and it was partitioned >> nicely >> > > in some 10 s, on my 2 core laptop. >> > > >> > > Johan >> > >> > Nice, in particular since we haven't run any 3D test cases ourselves, >> > just a tiny mesh of the unit square... :-) >> >> Yes I thought so too ;) >> >> Johan >> _______________________________________________ >> DOLFIN-dev mailing list >> [email protected] >> http://www.fenics.org/mailman/listinfo/dolfin-dev >> >> >> While we are on the topic of parallelisation - I have some comments. >> >> A couple of months ago I was trying to parallelise some of my own code and >> ran >> into some problems with the communicators used for PETSc and SLEPc problems >> - I >> sort of got them to work, but never commited my changes because they were a >> little ungainly and I felt I needed to spend some more time on them. >> >> One thing that I did notice is that the user does not have much control over >> which parts of the process run in parallel - with the number of MPI processes >> deciding whether or not a parallel implementation should be used. In my case >> what I wanted to do was perform a frequency sweep and for each frequency >> point >> perform the same calculation (an eigenvalue problem) for that frequency. My >> intention was to distribute the frequency sweep over a number of processors >> and >> then handle the assembly and solution of each system separately. This is not >> possible as the code is now since the assembly and eigenvalue solvers all try >> to run in parallel as soon as the applicaion is run as an mpi program. >> >> I know that this is not very descriptive, but does anyone else have thoughts >> on >> the matter? I will put together something a little more concrete as soon as >> I >> have a chance (I am travelling around quite a bit at the moment so it is >> difficult for me to focus). > > We're just getting started. Everything needs to be configurable. At > the moment, we assume that all parallel computation should be split in > MPI::num_processes(). > > We can either add optional parameters to all parallel calls or add > global options to control how many processes are used. But I suggest
Global options wouldn't solve anything (the point here is variable number of processes during the application lifetime), and it's probably enough with a single parameter, the communicator, perhaps permanently attached to each FunctionSpace (dofmap must be distributed over the processes in a communicator to be usable). Calls to MPI::num_processes() would be replaced by V.communicator().num_processes() or something. I get the waiting argument though. When stuff works, a search for MPI::* should reveal the places where "V.communicator()." should replace "MPI::", likely covering most places that need to be updated. Martin > we wait until we get all pieces in place. We currently have parallel > parsing and partitioning working. Next step will be to get parallel > assembly working, then the parallel solve (presumably simple since > it's handled by PETSc or Epetra). > > -- > Anders _______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
