On Sat, Dec 20, 2008 at 12:58:04PM +0100, Martin Sandve Alnæs wrote: > On Fri, Dec 19, 2008 at 9:08 PM, Anders Logg <[email protected]> wrote: > > On Fri, Dec 19, 2008 at 01:01:50PM +0100, Martin Sandve Alnæs wrote: > >> On Fri, Dec 19, 2008 at 11:54 AM, Anders Logg <[email protected]> wrote: > >> > On Fri, Dec 19, 2008 at 09:23:43AM +0100, Martin Sandve Alnæs wrote: > >> >> On Fri, Dec 19, 2008 at 9:12 AM, Anders Logg <[email protected]> wrote: > >> >> > On Fri, Dec 19, 2008 at 09:06:43AM +0100, Martin Sandve Alnæs wrote: > >> >> >> On Fri, Dec 19, 2008 at 8:54 AM, Anders Logg <[email protected]> wrote: > >> >> >> > On Fri, Dec 19, 2008 at 12:09:50PM +0900, Evan Lezar wrote: > >> >> >> >> > >> >> >> >> > >> >> >> >> On Thu, Dec 18, 2008 at 4:25 AM, Johan Hake <[email protected]> > >> >> >> >> wrote: > >> >> >> >> > >> >> >> >> On Wednesday 17 December 2008 20:19:52 Anders Logg wrote: > >> >> >> >> > On Wed, Dec 17, 2008 at 08:13:03PM +0100, Johan Hake wrote: > >> >> >> >> > > On Wednesday 17 December 2008 19:20:11 Anders Logg wrote: > >> >> >> >> > > > Ola and I have now finished up the first round of > >> >> >> >> getting DOLFIN to > >> >> >> >> > > > run in parallel. In short, we can now parse meshes from > >> >> >> >> file in > >> >> >> >> > > > parallel and partition meshes in parallel (using > >> >> >> >> ParMETIS). > >> >> >> >> > > > > >> >> >> >> > > > We reused some good ideas that Niclas Jansson had > >> >> >> >> implemented in > >> >> >> >> > > > PXMLMesh before, but have also made some significant > >> >> >> >> changes as > >> >> >> >> > > > follows: > >> >> >> >> > > > > >> >> >> >> > > > 1. The XML reader does not handle any partitioning. > >> >> >> >> > > > > >> >> >> >> > > > 2. The XML reader just reads in a chunk of the mesh > >> >> >> >> data on each > >> >> >> >> > > > processor (in parallel) and stores that into a > >> >> >> >> LocalMeshData object > >> >> >> >> > > > (one for each processor). The data is just partitioned > >> >> >> >> in blocks so > >> >> >> >> > > > the vertices and cells may be completely unrelated. > >> >> >> >> > > > > >> >> >> >> > > > 3. The partitioning takes place in > >> >> >> >> MeshPartitioning::partition, > >> >> >> >> > > > which gets a LocalMeshData object on each processor. It > >> >> >> >> then calls > >> >> >> >> > > > ParMETIS to compute a partition (in parallel) and then > >> >> >> >> redistributes > >> >> >> >> > > > the data accordingly. Finally, a mesh is built on each > >> >> >> >> processor > >> >> >> >> using > >> >> >> >> > > > the local data. > >> >> >> >> > > > > >> >> >> >> > > > 4. All direct MPI calls (except one which should be > >> >> >> >> removed) have > >> >> >> >> been > >> >> >> >> > > > removed from the code. Instead, we mostly rely on > >> >> >> >> > > > dolfin::MPI::distribute which handles most cases of > >> >> >> >> parallel > >> >> >> >> > > > communication and works with STL data structures. > >> >> >> >> > > > > >> >> >> >> > > > 5. There is just one ParMETIS call (no initial geometric > >> >> >> >> > > > partitioning). It seemed like an unnecessary step, or > >> >> >> >> are there good > >> >> >> >> > > > reasons to perform the partitioning in two steps? > >> >> >> >> > > > > >> >> >> >> > > > For testing, go to sandbox/passembly, build and then run > >> >> >> >> > > > > >> >> >> >> > > > mpirun -n 4 ./demo > >> >> >> >> > > > ./plot_partitions 4 > >> >> >> >> > > > >> >> >> >> > > Looks beautiful! > >> >> >> >> > > > >> >> >> >> > > I threw a 3D mesh of 160K vertices onto it, and it was > >> >> >> >> partitioned > >> >> >> >> nicely > >> >> >> >> > > in some 10 s, on my 2 core laptop. > >> >> >> >> > > > >> >> >> >> > > Johan > >> >> >> >> > > >> >> >> >> > Nice, in particular since we haven't run any 3D test cases > >> >> >> >> ourselves, > >> >> >> >> > just a tiny mesh of the unit square... :-) > >> >> >> >> > >> >> >> >> Yes I thought so too ;) > >> >> >> >> > >> >> >> >> Johan > >> >> >> >> _______________________________________________ > >> >> >> >> DOLFIN-dev mailing list > >> >> >> >> [email protected] > >> >> >> >> http://www.fenics.org/mailman/listinfo/dolfin-dev > >> >> >> >> > >> >> >> >> > >> >> >> >> While we are on the topic of parallelisation - I have some > >> >> >> >> comments. > >> >> >> >> > >> >> >> >> A couple of months ago I was trying to parallelise some of my own > >> >> >> >> code and ran > >> >> >> >> into some problems with the communicators used for PETSc and > >> >> >> >> SLEPc problems - I > >> >> >> >> sort of got them to work, but never commited my changes because > >> >> >> >> they were a > >> >> >> >> little ungainly and I felt I needed to spend some more time on > >> >> >> >> them. > >> >> >> >> > >> >> >> >> One thing that I did notice is that the user does not have much > >> >> >> >> control over > >> >> >> >> which parts of the process run in parallel - with the number of > >> >> >> >> MPI processes > >> >> >> >> deciding whether or not a parallel implementation should be used. > >> >> >> >> In my case > >> >> >> >> what I wanted to do was perform a frequency sweep and for each > >> >> >> >> frequency point > >> >> >> >> perform the same calculation (an eigenvalue problem) for that > >> >> >> >> frequency. My > >> >> >> >> intention was to distribute the frequency sweep over a number of > >> >> >> >> processors and > >> >> >> >> then handle the assembly and solution of each system separately. > >> >> >> >> This is not > >> >> >> >> possible as the code is now since the assembly and eigenvalue > >> >> >> >> solvers all try > >> >> >> >> to run in parallel as soon as the applicaion is run as an mpi > >> >> >> >> program. > >> >> >> >> > >> >> >> >> I know that this is not very descriptive, but does anyone else > >> >> >> >> have thoughts on > >> >> >> >> the matter? I will put together something a little more concrete > >> >> >> >> as soon as I > >> >> >> >> have a chance (I am travelling around quite a bit at the moment > >> >> >> >> so it is > >> >> >> >> difficult for me to focus). > >> >> >> > > >> >> >> > We're just getting started. Everything needs to be configurable. At > >> >> >> > the moment, we assume that all parallel computation should be > >> >> >> > split in > >> >> >> > MPI::num_processes(). > >> >> >> > > >> >> >> > We can either add optional parameters to all parallel calls or add > >> >> >> > global options to control how many processes are used. But I > >> >> >> > suggest > >> >> >> > >> >> >> Global options wouldn't solve anything (the point here is variable > >> >> >> number of processes during the application lifetime), and it's > >> >> >> probably > >> >> >> enough with a single parameter, the communicator, perhaps permanently > >> >> >> attached to each FunctionSpace (dofmap must be distributed over > >> >> >> the processes in a communicator to be usable). Calls to > >> >> >> MPI::num_processes() > >> >> >> would be replaced by V.communicator().num_processes() or something. > >> >> >> I get the waiting argument though. When stuff works, a search for > >> >> >> MPI::* > >> >> >> should reveal the places where "V.communicator()." should replace > >> >> >> "MPI::", > >> >> >> likely covering most places that need to be updated. > >> >> >> > >> >> >> Martin > >> >> > > >> >> > Sounds good. > >> >> > > >> >> > Speaking of the communicator, I'd like to add access to a default > >> >> > communicator in the MPI class which could be accessed by > >> >> > MPI::communicator() but couldn't figure out how to do this (I suspect > >> >> > it's trivial). > >> >> > > >> >> > Then we could avoid including mpi.h in MeshPartitioning.cpp (look for > >> >> > the two first FIXMEs in that file). If anyone knows how to do this, > >> >> > let me know. > >> >> > >> >> You mean something like > >> >> > >> >> class MPI { > >> >> public: > >> >> > >> >> static dolfin::Communicator & communicator() > >> >> { return *mainCommunicator; } > >> >> > >> >> private: > >> >> static shared_ptr<Communicator> mainCommunicator; > >> >> }; > >> >> > >> >> ? > >> >> > >> >> This can then be initialized the first time MPI::communicator() > >> >> is called, to be a communicator including all MPI processes. > >> >> > >> >> Epetra has a "serial communicator", which is probably needed > >> >> as well when MPI is not in use, instead of adding #ifdefs all > >> >> places MPI is used. > >> >> > >> >> Martin > >> > > >> > Yes, something like that. But what would the Communicator class look > >> > like? If you know how to do this, feel free to just add it. > >> > > >> > >> Basically, it looks like class MPI can be renamed to Communicator. > >> All its functions must be made non-static, and it should have a member > >> variable to hold the communicator ID. All occurences of MPI_COMM_WORLD > >> in MPI.cpp should be replaced with the communicator ID variable. > >> And all other places in the code where MPI_COMM_WORLD is needed, > >> a communicator must be used instead (i.e. distribute). > >> > >> The communicator design will also affect the linear algebra backend, > >> depending on how those libraries do this... The persons who design > >> this should probably take a look at how both PETSc and Epetra > >> handle this to get ideas. > >> > >> But I don't have time to really get involved with this now. > >> It's not like this is a simple thing to do on the side ;) > >> > >> Martin > > > > I don't understand the point. The functions in MPI are static by > > nature, how many processes, current process number etc. > > Those functions depend on MPI_COMM_WORLD, which > is the group of all processes. If you want to run one part > of the application on a subset of the processes, you wouldn't > want to use the current MPI::gather etc, since it works with > all processes. Instead you can define a communicator for > that subset of processes, and "how many processes", > "current process number" etc is defined within that context. > > > But we do need something extra (the communicator) which can be shared > > between objects that should share communication pattern. > > A "communicator" is a MPI term, and doesn't really involve > anything I would associate with a "communication pattern", > it's just a named group of processes. > > > We'll get to this in time, unless someone has time to think about it > > now, and provide the implementation... ;-) > > Since current MPI ~ future Communicator the change > should probably be fairly easy to do later on. > > Martin
Yes, and then we can decide if we want the Communicator to be just a a
handle to be passed to functions in dolfin::MPI or if it should also
provide functions. I imagine we might do something like this:
class MPI
{
public:
void static foo(args, const Communicator& communicator=default_communicator);
};
class Communicator
{
public:
void foo(args) { MPI::foo(args, *this); }
};
--
Anders
signature.asc
Description: Digital signature
_______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
