Hmm, so it's either setting up a totally new workspace or replacing with
OMPI_LINK_IFELSE would get me the right configure check. I think the
latter is the fix to my problem. I assume make all should work now
unless I'll tell you otherwise...
48773 configure:123602: checking for rdma_get_peer_addr
48774 configure:123627: pgcc -o conftest -g -D_REENTRANT
-I/opt/ofed/includ e -L/opt/ofed/lib64 conftest.c -lnsl -lutil
-lpthread -libverbs >&5
48775 conftest.c:
48776 PGC-W-0155-Pointer value created from a nonlong integral type
(conftest .c: 423)
48777 PGC/x86-64 Linux 7.1-2: compilation completed with warnings
48778 conftest.o: In function `main':
48779
/share/home/00951/paklui/ompi-trunk5/config-data2-debug/conftest.c:423:
undefined reference to `rdma_get_peer_addr'
48780 configure:123633: $? = 2
48781 configure: failed program was:
48782 | /* confdefs.h. */
48783 | #define PACKAGE_NAME "Open MPI"
...
49196 | #define HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE 1
49197 | #define HAVE_RDMA_RDMA_CMA_H 1
49198 | /* end confdefs.h. */
49199 | #include "rdma/rdma_cma.h"
49200 |
49201 | int
49202 | main ()
49203 | {
49204 | void *ret = (void*) rdma_get_peer_addr((struct rdma_cm_id*)0);
49205 | ;
49206 | return 0;
49207 | }
49208 configure:123650: result: no
Pak Lui wrote:
For sanity sake I also checked the LD_LIBRARY_PATH, doesn't seem to be
anything suspicious there either...
login3% echo $LD_LIBRARY_PATH
/opt/apps/pgi/7.1/linux86-64/7.1-2/libso:/opt/gsi-openssh-4.1/lib:/opt/gsi-openssh-4.1/lib:/opt/apps/binutils-amd/070220/lib64
I am trying Jeff's suggestion to replace OMPI_COMPILE_IFELSE to
OMPI_LINK_IFELSE. Will let you know.
Pak Lui wrote:
Jeff Squyres wrote:
Jon / Steve -- can you comment?
I tested with OFED 1.2.5 (which is what I assume you meant) and got:
checking for rdma_get_peer_addr... no
Because that function is not defined in OFED 1.2.5. Running with OFED
1.3 (where the function does exist), I get:
checking for rdma_get_peer_addr... yes
For me it seems to be running with 1.2.5.
login3% /opt/ofed/bin/ofed_info | head -1
OFED-1.2.5.5
No rmda_get_peer_addr or rmda_get_local_addr in these .so's, assumingly
they are coming from there.
login3% ls librdmacm.so*
librdmacm.so librdmacm.so.1 librdmacm.so.1.0.0 librdmacm.so.1.0.2
login3% nm librdmacm.so* | grep rdma_get_
0000000000003470 T rdma_get_cm_event
0000000000001a20 T rdma_get_devices
0000000000003470 T rdma_get_cm_event
0000000000001a20 T rdma_get_devices
0000000000003470 T rdma_get_cm_event
0000000000001a20 T rdma_get_devices
0000000000003470 T rdma_get_cm_event
0000000000001a20 T rdma_get_devices
And I don't see rdma_get_peer_addr appeared in the
/opt/ofed/include/rdma/rdma_cma.h either. Not knowing how it actually
know about the interface (and it's not inline) there.
Outside of all the configure complexity, can you write a simple
program that calls that function and have it compile and link properly?
These are the references of rmda_get_peer_addr from the config.log:
47858 configure:120941: checking for rdma_get_peer_addr
47859 configure:120966: pgcc -c -g -D_REENTRANT
-I/opt/ofed/include conftest.c >&5
47860 PGC-W-0155-Pointer value created from a nonlong integral type
(conftest .c: 412)
47861 PGC/x86-64 Linux 7.1-2: compilation completed with warnings
47862 configure:120972: $? = 0
47863 configure:120987: result: yes
...
48355 configure:123600: checking for rdma_get_peer_addr
48356 configure:123625: pgcc -c -g -D_REENTRANT
-I/opt/ofed/include conftes t.c >&5
48357 PGC-W-0155-Pointer value created from a nonlong integral type
(conftest .c: 423)
48358 PGC/x86-64 Linux 7.1-2: compilation completed with warnings
48359 configure:123631: $? = 0
48360 configure:123646: result: yes
Here's my program, not sure if it's doing it correctly. I am no m4
expert, so how do I run the ompi_check_openib.m4 independently and see
the conftest.c??
login3% cat mytest.c
#include "rdma/rdma_cma.h"
int main (void) {
void *ret = (void*) rdma_get_peer_addr((struct rdma_cm_id*)0);
return 0;
}
It gives me a warning if I just try to create an object, which is what I
see in the config.log.
login3% pgcc -c -g -D_REENTRANT -I/opt/ofed/include mytest.c
PGC-W-0155-Pointer value created from a nonlong integral type (mytest.c: 3)
PGC/x86-64 Linux 7.1-2: compilation completed with warnings
login3% echo $?
0
But trying to create an executable would give me the error.
login3% pgcc -g -D_REENTRANT -I/opt/ofed/include mytest.c -o mytest
PGC-W-0155-Pointer value created from a nonlong integral type (mytest.c: 3)
PGC/x86-64 Linux 7.1-2: compilation completed with warnings
/tmp/pgccjF6BryhFmWS.o: In function `main':
/share/home/00951/paklui/ompi-trunk5/config-data1-debug/mytest.c:3:
undefined reference to `rdma_get_peer_addr'
Hmm, any clues, comments?
I suppose we could change the AC_COMPILE_IFELSE in config/
ompi_check_openib.m4 to OMPI_LINK_IFELSE, but I'm a little confused as
to why it would compile successfully if the symbol rdma_get_peer_addr
is not declared anywhere (which it shouldn't be in OFED 1.2 or 1.2.5,
AFAIK)...
On May 3, 2008, at 10:56 AM, Pak Lui wrote:
Sure Jeff, see attached.
Jeff Squyres wrote:
(moving to devel so that others are aware)
Crud. Can you send me your config.log? I don't know why it's able
to find rdma_get_peer_addr() in configure, but then later not able
to find it during the build - I'd like to see what happened
during configure.
On May 2, 2008, at 7:09 PM, Pak Lui wrote:
Hi Jeff,
It seems that the cpc3 merge causes my Ranger build to break. I
believe it is using OFED 1.2 but I don't know how to check. It
passes the ompi_check_openib.m4 that you added in for the
rdma_get_peer_addr. Is there a missing #include for openib/ofed
related somewhere?
1236 checking rdma/rdma_cma.h usability... yes
1237 checking rdma/rdma_cma.h presence... yes
1238 checking for rdma/rdma_cma.h... yes
1239 checking for rdma_create_id in -lrdmacm... yes
1240 checking for rdma_get_peer_addr... yes
pgCC -DHAVE_CONFIG_H -I. -I../../../../ompi/tools/ompi_info -
I../../../opal/include -I../../../orte/include -I../../../ompi/
include -I../../../opal/mca/paffinity/linux/plpa/src/libplpa -
DOMPI_CONFIGURE_USER="\"paklui\"" -
DOMPI_CONFIGURE_HOST="\"login4.ranger.tacc.utexas.edu\"" -
DOMPI_CONFIGURE_DATE="\"Fri May 2 17:07:01 CDT 2008\"" -
DOMPI_BUILD_USER="\"$USER\"" -DOMPI_BUILD_HOST="\"`hostname`\"" -
DOMPI_BUILD_DATE="\"`date`\"" -DOMPI_BUILD_CFLAGS="\"-O -DNDEBUG
\"" -DOMPI_BUILD_CPPFLAGS="\"-I../../../.. -I../../.. -
I../../../../ opal/include -I../../../../orte/include -
I../../../../ompi/include - D_REENTRANT\"" -
DOMPI_BUILD_CXXFLAGS="\"-O -DNDEBUG \"" -
DOMPI_BUILD_CXXCPPFLAGS="\"-I../../../.. -I../../.. -I../../../../
opal/include -I../../../../orte/include -I../../../../ompi/
include - D_REENTRANT\"" -DOMPI_BUILD_FFLAGS="\"\"" -
DOMPI_BUILD_FCFLAGS="\"\"" -DOMPI_BUILD_LDFLAGS="\" \"" -
DOMPI_BUILD_LIBS="\"-lnsl -lutil -lpthread\"" -
DOMPI_CC_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/bin/pgcc
\"" - DOMPI_CXX_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/bin/
pgCC\"" -DOMPI_F77_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/
bin/ pgf77\"" -DOMPI_F90_ABSOLUTE="\"/opt/apps/pgi/7.1/
linux86-64/7.1-2/ bin/pgf95\"" -DOMPI_F90_BUILD_SIZE="\"small\"" -
I../../../.. - I../../.. -I../../../../opal/include -I../../../../
orte/include - I../../../../ompi/include -D_REENTRANT -O -
DNDEBUG -c -o version.o ../../../../ompi/tools/ompi_info/
version.cc
/bin/sh ../../../libtool --tag=CXX --mode=link pgCC -O -DNDEBUG
- o ompi_info components.o ompi_info.o output.o param.o
version.o ../../../ompi/libmpi.la -lnsl -lutil -lpthread
libtool: link: pgCC -O -DNDEBUG -o .libs/ompi_info components.o
ompi_info.o output.o param.o version.o ../../../ompi/.libs/
libmpi.so -L/opt/ofed/lib64 -libcm -lrdmacm -libverbs -lrt /share/
home/00951/paklui/ompi-trunk5/config-data1/orte/.libs/libopen-
rte.so /share/home/00951/paklui/ompi-trunk5/config-data1/
opal/.libs/ libopen-pal.so -lnuma -ldl -lnsl -lutil -lpthread -
Wl,--rpath -Wl,/ share/home/00951/paklui/ompi-trunk5/shared-
install1/lib
[1] Exit 2 make install >&
make.install.log.0
../../../ompi/.libs/libmpi.so: undefined reference to
`rdma_get_peer_addr'
../../../ompi/.libs/libmpi.so: undefined reference to
`rdma_get_local_addr'
make[2]: *** [ompi_info] Error 2
make[2]: Leaving directory `/share/home/00951/paklui/ompi-trunk5/
config-data1/ompi/tools/ompi_info'
make[1]: *** [install-recursive] Error 1
make[1]: Leaving directory `/share/home/00951/paklui/ompi-trunk5/
config-data1/ompi'
make: *** [install-recursive] Error 1
--
- Pak Lui
pak....@sun.com
--
- Pak Lui
pak....@sun.com
<config.log.bz2><mime-attachment.txt>
--
- Pak Lui
pak....@sun.com