Pak Lui wrote:
Orion Poplawski wrote:


In our setup (which I don't believe is very unique) the nodes are connected by two networks: an "admin" network which allows for connections from outside the cluster and an "MPI" network that is a private GigE network connecting the nodes for MPI traffic:

       +---------admin net (192.168.0.X)--------+
       |                           |            |
+-----------+                 +--------+    +--------+
| SGE Master|                 | coop00 |    | coop01 |
|           |                 | coop00x|    | coop01x|
+-----------+                 +--------+    +--------+
                                   |            |
                                   +------------+

                                    MPI net (192.168.1.X)

So the "x" suffix names are the addresses on the MPI network.

Currently (loose integration), we create machines files like:

coop00x.cora.nwra.com cpu=2
coop01x.cora.nwra.com cpu=2

which makes the MPI traffic travel over the MPI network. I'm trying to duplicate this under "tight" integration.

Well, this is what we did with LAM and I naively assumed that since OpenMPI used that same machines file format, it worked the same there. But once I finally read the FAQ (specifically: <http://www.open-mpi.org/faq/?category=tcp#tcp-selection>) I see that it works totally differently.

So, default setup with gridengine integration works, and I just have:

btl_tcp_if_include = eth1

in my /etc/openmpi-mca-params.conf file.

Sorry for all the confusion.

--
Orion Poplawski
System Administrator                  303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  or...@cora.nwra.com
Boulder, CO 80301              http://www.cora.nwra.com

Reply via email to