Re: [OMPI devel] RFC: revamp topo framework

George Bosilca Fri, 30 Oct 2009 11:54:08 -0400

Luigi,

The current way Open MPI is selecting the network to be used betweenprocesses, match very well the first approach you proposed. As wesupport multiple networks simultaneously, a BTL (the low level networkdriver) can service only a subset of peers. All other communicationswill automatically be redirected through another BTL (which has to beavailable). In the past there were some attempts to route messages butthis code is not in the trunk.


  george.

On Oct 30, 2009, at 04:47 , Luigi Scorzato wrote:

I am very interested in this, but let me explain in more details mypresent situation and goals.
I am working in a group who is testing a system under developmentwhich is connected with both:- an ordinary all to all standard interface (where open-mpi isalready available) but with limited performances and scalability.- a custom 3D torus network, with no mpi available, custom low-levelcommunication primitives (under development), from which we expecthigher performance and scalability.
I have two approaches in mind:

1st approach.
Use the standard network interface to setup MPI. However, through aprecompilation step, redefine a few MPI_ functions (MPI_Send()MPI_Recv() and others) such that they call the torus primitives, ifthe communication is between nearest neighbors, and fall back intostandard MPI through the standard interface if not. This can onlywork if I can choose the mpi-ranks of my system in a way thatMPI_Cart_create() will generate coordinates consistent with thephysical topology.***There must be a place - somewhere in the open-mpi code - wherethe cartesian coordinates are chosen, presumably as a deterministicfunction of the mpi-ranks and the dimensions (as given byMPI_Dims_create). I expected it to be in MPI_Cart_create(). But Icould not find it. Can anyone help?***This approach has obvious limitations of portability, besidesrequiring the availability of a fallback network, but it gives mefull control of what I need to do, which is essential since myprimary goal is to get a few important codes working in the newsystem asap.
2nd approach.
Develop a new "torus" topo component, as explained by Jeff. This iscertainly the *right* solution, but there are two problems:- because of my poor familiarity with the open-mpi source code, I amnot able to estimate how long it will take me.- in a first phase, the torus primitives will not support all to allcommunications but only nearest neighbors ones. Hence, fullportability is excluded anyway and/or a fallback network stillneeded. In other words, the topo component should be able to dealwith two networks, and I have no idea of how much this willcomplicate things.
I necessarily have to push the 1st approach, for the moment, but Iam very much interested in studying the 2nd and if I see that it isrealistic (given the limitations above) and safe, I may turn to itcompletely.
thanks for your feedback and best regards, Luigi

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: revamp topo framework

Reply via email to