On Fri, Dec 19, 2008 at 9:08 PM, Anders Logg <[email protected]> wrote: > On Fri, Dec 19, 2008 at 01:01:50PM +0100, Martin Sandve Alnæs wrote: >> On Fri, Dec 19, 2008 at 11:54 AM, Anders Logg <[email protected]> wrote: >> > On Fri, Dec 19, 2008 at 09:23:43AM +0100, Martin Sandve Alnæs wrote: >> >> On Fri, Dec 19, 2008 at 9:12 AM, Anders Logg <[email protected]> wrote: >> >> > On Fri, Dec 19, 2008 at 09:06:43AM +0100, Martin Sandve Alnæs wrote: >> >> >> On Fri, Dec 19, 2008 at 8:54 AM, Anders Logg <[email protected]> wrote: >> >> >> > On Fri, Dec 19, 2008 at 12:09:50PM +0900, Evan Lezar wrote: >> >> >> >> >> >> >> >> >> >> >> >> On Thu, Dec 18, 2008 at 4:25 AM, Johan Hake <[email protected]> wrote: >> >> >> >> >> >> >> >> On Wednesday 17 December 2008 20:19:52 Anders Logg wrote: >> >> >> >> > On Wed, Dec 17, 2008 at 08:13:03PM +0100, Johan Hake wrote: >> >> >> >> > > On Wednesday 17 December 2008 19:20:11 Anders Logg wrote: >> >> >> >> > > > Ola and I have now finished up the first round of getting >> >> >> >> DOLFIN to >> >> >> >> > > > run in parallel. In short, we can now parse meshes from >> >> >> >> file in >> >> >> >> > > > parallel and partition meshes in parallel (using >> >> >> >> ParMETIS). >> >> >> >> > > > >> >> >> >> > > > We reused some good ideas that Niclas Jansson had >> >> >> >> implemented in >> >> >> >> > > > PXMLMesh before, but have also made some significant >> >> >> >> changes as >> >> >> >> > > > follows: >> >> >> >> > > > >> >> >> >> > > > 1. The XML reader does not handle any partitioning. >> >> >> >> > > > >> >> >> >> > > > 2. The XML reader just reads in a chunk of the mesh data >> >> >> >> on each >> >> >> >> > > > processor (in parallel) and stores that into a >> >> >> >> LocalMeshData object >> >> >> >> > > > (one for each processor). The data is just partitioned in >> >> >> >> blocks so >> >> >> >> > > > the vertices and cells may be completely unrelated. >> >> >> >> > > > >> >> >> >> > > > 3. The partitioning takes place in >> >> >> >> MeshPartitioning::partition, >> >> >> >> > > > which gets a LocalMeshData object on each processor. It >> >> >> >> then calls >> >> >> >> > > > ParMETIS to compute a partition (in parallel) and then >> >> >> >> redistributes >> >> >> >> > > > the data accordingly. Finally, a mesh is built on each >> >> >> >> processor >> >> >> >> using >> >> >> >> > > > the local data. >> >> >> >> > > > >> >> >> >> > > > 4. All direct MPI calls (except one which should be >> >> >> >> removed) have >> >> >> >> been >> >> >> >> > > > removed from the code. Instead, we mostly rely on >> >> >> >> > > > dolfin::MPI::distribute which handles most cases of >> >> >> >> parallel >> >> >> >> > > > communication and works with STL data structures. >> >> >> >> > > > >> >> >> >> > > > 5. There is just one ParMETIS call (no initial geometric >> >> >> >> > > > partitioning). It seemed like an unnecessary step, or are >> >> >> >> there good >> >> >> >> > > > reasons to perform the partitioning in two steps? >> >> >> >> > > > >> >> >> >> > > > For testing, go to sandbox/passembly, build and then run >> >> >> >> > > > >> >> >> >> > > > mpirun -n 4 ./demo >> >> >> >> > > > ./plot_partitions 4 >> >> >> >> > > >> >> >> >> > > Looks beautiful! >> >> >> >> > > >> >> >> >> > > I threw a 3D mesh of 160K vertices onto it, and it was >> >> >> >> partitioned >> >> >> >> nicely >> >> >> >> > > in some 10 s, on my 2 core laptop. >> >> >> >> > > >> >> >> >> > > Johan >> >> >> >> > >> >> >> >> > Nice, in particular since we haven't run any 3D test cases >> >> >> >> ourselves, >> >> >> >> > just a tiny mesh of the unit square... :-) >> >> >> >> >> >> >> >> Yes I thought so too ;) >> >> >> >> >> >> >> >> Johan >> >> >> >> _______________________________________________ >> >> >> >> DOLFIN-dev mailing list >> >> >> >> [email protected] >> >> >> >> http://www.fenics.org/mailman/listinfo/dolfin-dev >> >> >> >> >> >> >> >> >> >> >> >> While we are on the topic of parallelisation - I have some comments. >> >> >> >> >> >> >> >> A couple of months ago I was trying to parallelise some of my own >> >> >> >> code and ran >> >> >> >> into some problems with the communicators used for PETSc and SLEPc >> >> >> >> problems - I >> >> >> >> sort of got them to work, but never commited my changes because >> >> >> >> they were a >> >> >> >> little ungainly and I felt I needed to spend some more time on them. >> >> >> >> >> >> >> >> One thing that I did notice is that the user does not have much >> >> >> >> control over >> >> >> >> which parts of the process run in parallel - with the number of MPI >> >> >> >> processes >> >> >> >> deciding whether or not a parallel implementation should be used. >> >> >> >> In my case >> >> >> >> what I wanted to do was perform a frequency sweep and for each >> >> >> >> frequency point >> >> >> >> perform the same calculation (an eigenvalue problem) for that >> >> >> >> frequency. My >> >> >> >> intention was to distribute the frequency sweep over a number of >> >> >> >> processors and >> >> >> >> then handle the assembly and solution of each system separately. >> >> >> >> This is not >> >> >> >> possible as the code is now since the assembly and eigenvalue >> >> >> >> solvers all try >> >> >> >> to run in parallel as soon as the applicaion is run as an mpi >> >> >> >> program. >> >> >> >> >> >> >> >> I know that this is not very descriptive, but does anyone else have >> >> >> >> thoughts on >> >> >> >> the matter? I will put together something a little more concrete >> >> >> >> as soon as I >> >> >> >> have a chance (I am travelling around quite a bit at the moment so >> >> >> >> it is >> >> >> >> difficult for me to focus). >> >> >> > >> >> >> > We're just getting started. Everything needs to be configurable. At >> >> >> > the moment, we assume that all parallel computation should be split >> >> >> > in >> >> >> > MPI::num_processes(). >> >> >> > >> >> >> > We can either add optional parameters to all parallel calls or add >> >> >> > global options to control how many processes are used. But I suggest >> >> >> >> >> >> Global options wouldn't solve anything (the point here is variable >> >> >> number of processes during the application lifetime), and it's probably >> >> >> enough with a single parameter, the communicator, perhaps permanently >> >> >> attached to each FunctionSpace (dofmap must be distributed over >> >> >> the processes in a communicator to be usable). Calls to >> >> >> MPI::num_processes() >> >> >> would be replaced by V.communicator().num_processes() or something. >> >> >> I get the waiting argument though. When stuff works, a search for >> >> >> MPI::* >> >> >> should reveal the places where "V.communicator()." should replace >> >> >> "MPI::", >> >> >> likely covering most places that need to be updated. >> >> >> >> >> >> Martin >> >> > >> >> > Sounds good. >> >> > >> >> > Speaking of the communicator, I'd like to add access to a default >> >> > communicator in the MPI class which could be accessed by >> >> > MPI::communicator() but couldn't figure out how to do this (I suspect >> >> > it's trivial). >> >> > >> >> > Then we could avoid including mpi.h in MeshPartitioning.cpp (look for >> >> > the two first FIXMEs in that file). If anyone knows how to do this, >> >> > let me know. >> >> >> >> You mean something like >> >> >> >> class MPI { >> >> public: >> >> >> >> static dolfin::Communicator & communicator() >> >> { return *mainCommunicator; } >> >> >> >> private: >> >> static shared_ptr<Communicator> mainCommunicator; >> >> }; >> >> >> >> ? >> >> >> >> This can then be initialized the first time MPI::communicator() >> >> is called, to be a communicator including all MPI processes. >> >> >> >> Epetra has a "serial communicator", which is probably needed >> >> as well when MPI is not in use, instead of adding #ifdefs all >> >> places MPI is used. >> >> >> >> Martin >> > >> > Yes, something like that. But what would the Communicator class look >> > like? If you know how to do this, feel free to just add it. >> > >> >> Basically, it looks like class MPI can be renamed to Communicator. >> All its functions must be made non-static, and it should have a member >> variable to hold the communicator ID. All occurences of MPI_COMM_WORLD >> in MPI.cpp should be replaced with the communicator ID variable. >> And all other places in the code where MPI_COMM_WORLD is needed, >> a communicator must be used instead (i.e. distribute). >> >> The communicator design will also affect the linear algebra backend, >> depending on how those libraries do this... The persons who design >> this should probably take a look at how both PETSc and Epetra >> handle this to get ideas. >> >> But I don't have time to really get involved with this now. >> It's not like this is a simple thing to do on the side ;) >> >> Martin > > I don't understand the point. The functions in MPI are static by > nature, how many processes, current process number etc.
Those functions depend on MPI_COMM_WORLD, which is the group of all processes. If you want to run one part of the application on a subset of the processes, you wouldn't want to use the current MPI::gather etc, since it works with all processes. Instead you can define a communicator for that subset of processes, and "how many processes", "current process number" etc is defined within that context. > But we do need something extra (the communicator) which can be shared > between objects that should share communication pattern. A "communicator" is a MPI term, and doesn't really involve anything I would associate with a "communication pattern", it's just a named group of processes. > We'll get to this in time, unless someone has time to think about it > now, and provide the implementation... ;-) Since current MPI ~ future Communicator the change should probably be fairly easy to do later on. Martin _______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
