On Aug 9, 2005, at 7:24 AM, Sridhar Chirravuri wrote:
I have fixed the timing issue between the server and client, and now I
could build Open MPI successfully.
Good.
Here is the output of ompi_info....
[root@micrompi-2 ompi]# ompi_info
Open MPI: 1.0a1r6760M
Note that as of this morning (US Eastern time), the current head is
r6774. Also be wary of any local mods you have put in the tree (as
noted by the "M"). Check "svn status" to see which files you have
modified, and "svn diff" to see the exact changes.
This time, I could see that btl mvapi component is built.
But I am still seeing the same problem while running Pallas Benchmark
i.e., I still see that the data is passing over TCP/GigE and NOT over
Infiniband.
Please note that the 2nd generation point-to-point implementation is
still the default (where we have no IB support) -- all the IB support,
both mVAPI and Open IB, is in the 3rd generation point-to-point
implementation. You must explicitly request the 3rd generation
point-to-point implementation at run time to get IB support. Check out
slide 48, "Example: Forcing ob1/BTL" in the slides that we discussed on
the teleconference (were you on the teleconference? I attached copies
if you were not). The short version is that you need to tell Open MPI
to use the "ob1" pml component (3rd gen), not the default "teg" pml
component (2nd gen).
We'll eventually make the 3rd gen stuff be the default, and likely
remove all the 2nd gen stuff (i.e., definitely before release) -- we
just haven't done it yet because Tim and Galen are still polishing up
the 3rd gen stuff.
I have disabled building OpenIB and to do so I have touched
.ompi_ignore. This should not be a problem for MVAPI.
If the Open IB headers / libraries are not located in compiler-known
locations, then you shouldn't need to .ompi_ignore the tree (i.e.,
configure won't find the Open IB headers / libraries, and will
therefore automatically skip those components).
Again, it is our intention that users will neither know about nor have
to touch files in the distribution -- they only need use appropriate
options to "configure" and then "make".
I'm not sure if we have explicit options to disable a component in
configure -- Brian, can you comment here?
I have run autogen.sh, configure and make all. The output of
autogen.sh, configure and make all commands are <<ompi_out.tar.gz>>
gzip'ed in ompi_out.tar.gz file which is attached in this mail. This
gzip file also contains the output of Pallas Benchmark results. At the
end of Pallas Benchmark output, you can find the error
Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce_scatter.c, 79)
Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce_scatter.c, 79)
Request for 0 bytes (coll_basic_reduce.c, 193)
..and Pallas just hung.
I have no clue about the above errors which are coming from Open MPI
source code.
The 2nd generation component has fallen into some disrepair -- I'd try
re-running with ob1 and see what happens. I have not seen such errors
when running PMB before, but I can try running it again to see if we've
broken something recently.
Is there any thing that I am missing while building btl mvapi? Also,
is anyone built for mvapi and tested this OMPI stack. Please let me
know.
Galen Shipman and Tim Woodall are doing all the IB work.
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/