Hi, I have fixed the timing issue between the server and client, and now I could build Open MPI successfully.
Here is the output of ompi_info....
[root@micrompi-2 ompi]# ompi_info
Open MPI: 1.0a1r6760M
Open MPI SVN revision: r6760M
Open RTE: 1.0a1r6760M
Open RTE SVN revision: r6760M
OPAL: 1.0a1r6760M
OPAL SVN revision: r6760M
Prefix: /openmpi
Configured architecture: x86_64-redhat-linux-gnu
Configured by: root
Configured on: Mon Aug 8 23:58:08 IST 2005
Configure host: micrompi-2
Built by: root
Built on: Tue Aug 9 00:09:10 IST 2005
Built host: micrompi-2
C bindings: yes
C++ bindings: yes
Fortran77 bindings: yes (all)
Fortran90 bindings: no
C compiler: gcc
C compiler absolute: /usr/bin/gcc
C++ compiler: g++
C++ compiler absolute: /usr/bin/g++
Fortran77 compiler: g77
Fortran77 compiler abs: /usr/bin/g77
Fortran90 compiler: none
Fortran90 compiler abs: none
C profiling: yes
C++ profiling: yes
Fortran77 profiling: yes
Fortran90 profiling: no
C++ exceptions: no
Thread support: posix (mpi: no, progress: no)
Internal debug support: yes
MPI parameter check: runtime
Memory profiling support: yes
Memory debugging support: yes
libltdl support: 1
MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.0)
MCA coll: self (MCA v1.0, API v1.0, Component v1.0)
MCA io: romio (MCA v1.0, API v1.0, Component v1.0)
MCA mpool: mvapi (MCA v1.0, API v1.0, Component v1.0)
MCA mpool: sm (MCA v1.0, API v1.0, Component v1.0)
MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.0)
MCA pml: teg (MCA v1.0, API v1.0, Component v1.0)
MCA pml: uniq (MCA v1.0, API v1.0, Component v1.0)
MCA ptl: self (MCA v1.0, API v1.0, Component v1.0)
MCA ptl: sm (MCA v1.0, API v1.0, Component v1.0)
MCA ptl: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA btl: mvapi (MCA v1.0, API v1.0, Component v1.0)
MCA btl: self (MCA v1.0, API v1.0, Component v1.0)
MCA btl: sm (MCA v1.0, API v1.0, Component v1.0)
MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA topo: unity (MCA v1.0, API v1.0, Component v1.0)
MCA gpr: null (MCA v1.0, API v1.0, Component v1.0)
MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.0)
MCA gpr: replica (MCA v1.0, API v1.0, Component v1.0)
MCA iof: proxy (MCA v1.0, API v1.0, Component v1.0)
MCA iof: svc (MCA v1.0, API v1.0, Component v1.0)
MCA ns: proxy (MCA v1.0, API v1.0, Component v1.0)
MCA ns: replica (MCA v1.0, API v1.0, Component v1.0)
MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA ras: host (MCA v1.0, API v1.0, Component v1.0)
MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.0)
MCA rds: resfile (MCA v1.0, API v1.0, Component v1.0)
MCA rmaps: round_robin (MCA v1.0, API v1.0, Component
v1.0)
MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.0)
MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.0)
MCA rml: oob (MCA v1.0, API v1.0, Component v1.0)
MCA pls: fork (MCA v1.0, API v1.0, Component v1.0)
MCA pls: proxy (MCA v1.0, API v1.0, Component v1.0)
MCA pls: rsh (MCA v1.0, API v1.0, Component v1.0)
MCA sds: env (MCA v1.0, API v1.0, Component v1.0)
MCA sds: pipe (MCA v1.0, API v1.0, Component v1.0)
MCA sds: seed (MCA v1.0, API v1.0, Component v1.0)
MCA sds: singleton (MCA v1.0, API v1.0, Component v1.0)
This time, I could see that btl mvapi component is built.
But I am still seeing the same problem while running Pallas Benchmark
i.e., I still see that the data is passing over TCP/GigE and NOT over
Infiniband.
I have disabled building OpenIB and to do so I have touched
.ompi_ignore. This should not be a problem for MVAPI. I have run
autogen.sh, configure and make all. The output of autogen.sh, configure
and make all commands are <<ompi_out.tar.gz>> gzip'ed in
ompi_out.tar.gz file which is attached in this mail. This gzip file also
contains the output of Pallas Benchmark results. At the end of Pallas
Benchmark output, you can find the error
Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce_scatter.c, 79)
Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce_scatter.c, 79)
Request for 0 bytes (coll_basic_reduce.c, 193)
..and Pallas just hung.
I have no clue about the above errors which are coming from Open MPI
source code.
The configure options that I have used is
./configure --prefix=/openmpi --with-btl-mvapi=/usr/local/topspin/
and exported
export CFLAGS="-I/usr/local/topspin/include -I
/usr/local/topspin/include/vapi"
export LDFLAGS="-lmosal -lvapi -L/usr/local/topspin/lib64"
export btl_mvapi_LIBS="-lvapi -lmosal -L/usr/local/topspin/lib64"
export btl_mvapi_LDFLAGS=$btl_mvapi_LIBS
export btl_mvapi_CFLAGS=$CFLAGS
export LD_LIBRARY_PATH=/usr/local/topspin/lib64
export PATH=/openmpi/bin:$PATH
We are using Mellanox infiniband stack. We call it as MVAPICH 092 code
which is MPI stack over VAPI i.e, inifiniband.
Vapi.h is located in /usr/local/topspin/include/vapi and this path is
mentioned in CFLAGS.
Libmosal and libvapi are located in /usr/local/topspin/lib64 directory.
Info about machine:
model name : Intel(R) Xeon(TM) CPU 3.20GHz
Linux micrompi-2 2.6.9-5.ELsmp #1 SMP Wed Jan 5 19:29:47 EST 2005 x86_64
x86_64 x86_64 GNU/Linux
[root@micrompi-2 vapi]# cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant)
Is there any thing that I am missing while building btl mvapi? Also, is
anyone built for mvapi and tested this OMPI stack. Please let me know.
Thanks
-Sridhar
-----Original Message-----
From: [email protected] [mailto:[email protected]] On
Behalf Of Jeff Squyres
Sent: Monday, August 08, 2005 8:21 PM
To: Open MPI Developers
Subject: Re: [O-MPI devel] Fwd: Regarding MVAPI Component in Open MPI
It looks like you are having timestamp issues, e.g.:
> make: Warning: File `Makefile.am' has modification time 3.6e+04 s in
> the future
We typically see this in environments where NFS clients are not time
synchronized properly with the NFS server (e.g., using ntp either to
the NFS server directly, or to a common parent ntp server, or something
similar).
Automake-derived build systems are *extremely* sensitive to filesystem
timestamps because they are driven off Makefile dependencies. So if
you are working on a networked filesystem and do not have your time
tightly synchronized between the client and server, these kinds of
errors will occur.
Two fixes for this are:
1. Fix the time issues between network filesystem client and server
2. Build on a non-networked filesystem
On Aug 8, 2005, at 6:19 AM, Sridhar Chirravuri wrote:
>
> Hi,
>
> I was trying to build the latest code but as I mentioned in one of my
> previous mails, build is getting into a loop.
>
> [root@micrompi-1 ompi]# make all | tee mymake.log
>
> make: Warning: File `Makefile.am' has modification time 3.6e+04 s in
> the future
>
> cd . && /bin/sh /ompi/config/missing --run aclocal-1.9
>
> /usr/share/aclocal/libgcrypt.m4:23: warning: underquoted definition of
> AM_PATH_LIBGCRYPT
>
> run info '(automake)Extending aclocal'
>
> or see
> http://sources.redhat.com/automake/automake.html#Extending-aclocal
>
> /usr/share/aclocal/ao.m4:9: warning: underquoted definition of
> XIPH_PATH_AO
>
> cd . && /bin/sh /ompi/config/missing --run automake-1.9 --foreign
>
> cd . && /bin/sh /ompi/config/missing --run autoconf
>
> /bin/sh ./config.status --recheck
>
> /bin/sh ./config.status
>
> Making all in config
>
> make[1]: make[1]: Entering directory `/ompi/config'
>
> Warning: File `Makefile.am' has modification time 3.6e+04 s in the
> future
>
> cd .. && make am--refresh
>
> make[2]: Entering directory `/ompi'
>
> make[2]: Warning: File `Makefile.am' has modification time 3.6e+04 s
> in the future
>
> cd . && /bin/sh /ompi/config/missing --run aclocal-1.9
>
> /usr/share/aclocal/libgcrypt.m4:23: warning: underquoted definition of
> AM_PATH_LIBGCRYPT
>
> run info '(automake)Extending aclocal'
>
> or see
> http://sources.redhat.com/automake/automake.html#Extending-aclocal
>
> /usr/share/aclocal/ao.m4:9: warning: underquoted definition of
> XIPH_PATH_AO
>
> cd . && /bin/sh /ompi/config/missing --run automake-1.9 --foreign
>
> cd . && /bin/sh /ompi/config/missing --run autoconf
>
> /bin/sh ./config.status --recheck
>
> /bin/sh ./config.status
>
> make[2]: warning: Clock skew detected. Your build may be incomplete.
>
> make[2]: Leaving directory `/ompi'
>
> make[2]: Entering directory `/ompi'
>
> make[2]: Warning: File `Makefile.am' has modification time 3.6e+04 s
> in the future
>
> cd . && /bin/sh /ompi/config/missing --run aclocal-1.9
>
> /usr/share/aclocal/libgcrypt.m4:23: warning: underquoted definition of
> AM_PATH_LIBGCRYPT
>
> run info '(automake)Extending aclocal'
>
> or see
> http://sources.redhat.com/automake/automake.html#Extending-aclocal
>
> /usr/share/aclocal/ao.m4:9: warning: underquoted definition of
> XIPH_PATH_AO
>
> cd . && /bin/sh /ompi/config/missing --run automake-1.9 --foreign
>
> cd . && /bin/sh /ompi/config/missing --run autoconf
>
> /bin/sh ./config.status --recheck
>
> /bin/sh ./config.status
>
> make[2]: warning: Clock skew detected. Your build may be incomplete.
>
> make[2]: Leaving directory `/ompi'
>
> cd .. && make am--refresh
>
> make[2]: make[2]: Entering directory `/ompi'
>
> Warning: File `Makefile.am' has modification time 3.6e+04 s in the
> future
>
> cd . && /bin/sh /ompi/config/missing --run aclocal-1.9
>
> /usr/share/aclocal/libgcrypt.m4:23: warning: underquoted definition of
> AM_PATH_LIBGCRYPT
>
> run info '(automake)Extending aclocal'
>
> or see
> http://sources.redhat.com/automake/automake.html#Extending-aclocal
>
> /usr/share/aclocal/ao.m4:9: warning: underquoted definition of
> XIPH_PATH_AO
>
> cd . && /bin/sh /ompi/config/missing --run automake-1.9 --foreign
>
> make[2]: *** [Makefile.in] Interrupt
>
> make[1]: *** [../configure] Interrupt
>
> make: *** [all-recursive] Interrupt
>
>
> The config.status -recheck is being issued from Makefile. I have moved
> config.status to config.status.old and did touch config.status but
> still "make all" is going in loop.
>
> Is anyone tried building the latest code drop of OpenMPI? Or Is anyone
> has seen this type of behavior?
>
> Please let me know.
>
> Thanks
>
> -Sridhar
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
ompi_out.tar.gz
Description: ompi_out.tar.gz
