Luigi,

The current way Open MPI is selecting the network to be used between processes, match very well the first approach you proposed. As we support multiple networks simultaneously, a BTL (the low level network driver) can service only a subset of peers. All other communications will automatically be redirected through another BTL (which has to be available). In the past there were some attempts to route messages but this code is not in the trunk.

  george.

On Oct 30, 2009, at 04:47 , Luigi Scorzato wrote:



I am very interested in this, but let me explain in more details my present situation and goals.

I am working in a group who is testing a system under development which is connected with both: - an ordinary all to all standard interface (where open-mpi is already available) but with limited performances and scalability. - a custom 3D torus network, with no mpi available, custom low-level communication primitives (under development), from which we expect higher performance and scalability.


I have two approaches in mind:

1st approach.
Use the standard network interface to setup MPI. However, through a precompilation step, redefine a few MPI_ functions (MPI_Send() MPI_Recv() and others) such that they call the torus primitives, if the communication is between nearest neighbors, and fall back into standard MPI through the standard interface if not. This can only work if I can choose the mpi-ranks of my system in a way that MPI_Cart_create() will generate coordinates consistent with the physical topology. ***There must be a place - somewhere in the open-mpi code - where the cartesian coordinates are chosen, presumably as a deterministic function of the mpi-ranks and the dimensions (as given by MPI_Dims_create). I expected it to be in MPI_Cart_create(). But I could not find it. Can anyone help?*** This approach has obvious limitations of portability, besides requiring the availability of a fallback network, but it gives me full control of what I need to do, which is essential since my primary goal is to get a few important codes working in the new system asap.


2nd approach.
Develop a new "torus" topo component, as explained by Jeff. This is certainly the *right* solution, but there are two problems: - because of my poor familiarity with the open-mpi source code, I am not able to estimate how long it will take me. - in a first phase, the torus primitives will not support all to all communications but only nearest neighbors ones. Hence, full portability is excluded anyway and/or a fallback network still needed. In other words, the topo component should be able to deal with two networks, and I have no idea of how much this will complicate things.


I necessarily have to push the 1st approach, for the moment, but I am very much interested in studying the 2nd and if I see that it is realistic (given the limitations above) and safe, I may turn to it completely.

thanks for your feedback and best regards, Luigi

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to