Luigi,
The current way Open MPI is selecting the network to be used between
processes, match very well the first approach you proposed. As we
support multiple networks simultaneously, a BTL (the low level network
driver) can service only a subset of peers. All other communications
will automatically be redirected through another BTL (which has to be
available). In the past there were some attempts to route messages but
this code is not in the trunk.
george.
On Oct 30, 2009, at 04:47 , Luigi Scorzato wrote:
I am very interested in this, but let me explain in more details my
present situation and goals.
I am working in a group who is testing a system under development
which is connected with both:
- an ordinary all to all standard interface (where open-mpi is
already available) but with limited performances and scalability.
- a custom 3D torus network, with no mpi available, custom low-level
communication primitives (under development), from which we expect
higher performance and scalability.
I have two approaches in mind:
1st approach.
Use the standard network interface to setup MPI. However, through a
precompilation step, redefine a few MPI_ functions (MPI_Send()
MPI_Recv() and others) such that they call the torus primitives, if
the communication is between nearest neighbors, and fall back into
standard MPI through the standard interface if not. This can only
work if I can choose the mpi-ranks of my system in a way that
MPI_Cart_create() will generate coordinates consistent with the
physical topology.
***There must be a place - somewhere in the open-mpi code - where
the cartesian coordinates are chosen, presumably as a deterministic
function of the mpi-ranks and the dimensions (as given by
MPI_Dims_create). I expected it to be in MPI_Cart_create(). But I
could not find it. Can anyone help?***
This approach has obvious limitations of portability, besides
requiring the availability of a fallback network, but it gives me
full control of what I need to do, which is essential since my
primary goal is to get a few important codes working in the new
system asap.
2nd approach.
Develop a new "torus" topo component, as explained by Jeff. This is
certainly the *right* solution, but there are two problems:
- because of my poor familiarity with the open-mpi source code, I am
not able to estimate how long it will take me.
- in a first phase, the torus primitives will not support all to all
communications but only nearest neighbors ones. Hence, full
portability is excluded anyway and/or a fallback network still
needed. In other words, the topo component should be able to deal
with two networks, and I have no idea of how much this will
complicate things.
I necessarily have to push the 1st approach, for the moment, but I
am very much interested in studying the 2nd and if I see that it is
realistic (given the limitations above) and safe, I may turn to it
completely.
thanks for your feedback and best regards, Luigi
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel