Ah, my bad - thanks for correcting me!

I’ll have to ask folks tomorrow if we care about Myricom at this point.


> On Aug 24, 2015, at 10:52 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> 
> Ralph
> 
> mx = Myricom (not Mellanox, which is mxm).
> So,  there is probably nobody to fix anything specific to the MX support.
> Thus if this newly reported problem is (as I am going to guess) in 
> config/ompi_check_mx.m4 then it may go unfixed.
> You say you and I are the only ones to care, and I think we both care for 
> reasons related to software quality rather than any desire to use MX.
> 
> However, the LDFLAGS issues with the tests don't seem to be related to a 
> specific network.
> Unfortunately, I am right now composing an email reporting that you and I 
> arrived at the WRONG fix for that yesterday.
> 
> -Paul
> 
> On Mon, Aug 24, 2015 at 10:26 AM, Ralph Castain <r...@open-mpi.org 
> <mailto:r...@open-mpi.org>> wrote:
> You know, if it wren’t for the impact it would have on our users, I’d almost 
> say that if Mellanox doesn’t care enough to ensure this works, then maybe we 
> should just release and see if someone actually does care?
> 
> I’ll try again later today if/when I have time. Otherwise, I’ll raise it at 
> tomorrow’s telecon and see if anyone cares enough to fix it. At the moment, 
> it appears only you and I do - and I’m not sure I care enough to keep poking 
> it :-)
> 
> Thanks Paul!
> Ralph
> 
>> On Aug 24, 2015, at 10:19 AM, Paul Hargrove <phhargr...@lbl.gov 
>> <mailto:phhargr...@lbl.gov>> wrote:
>> 
>> Sorry to yet again be the bearer of bad news.
>> 
>> I am now configuring with
>> --prefix=[...] --enable-debug --with-libfabric=/opt/libfabric-1.0.0 
>> --with-mx=/opt/mx2g --disable-dlopen
>> This is like the previous configuration that caused problems, but with 
>> "--disable-dlopen" instead of "--enable-static --disable-shared".
>> I seems that each time I try something new, something else breaks.
>> 
>> The build finishes fine.
>> I can compile the examples fine.
>> But I once again see a failure to run an example:
>> 
>> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
>> -------------------------------------------------------
>> Primary job  terminated normally, but 1 process returned
>> a non-zero exit code.. Per user-direction, the job has been aborted.
>> -------------------------------------------------------
>> examples/ring_c: error while loading shared libraries: libmyriexpress.so: 
>> cannot open shared object file: No such file or directory
>> 
>> ldd agrees:
>> 
>> $ ldd examples/ring_c
>>         linux-vdso.so.1 =>  (0x00007fff332f0000)
>>         libmpi.so.12 => 
>> /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib/libmpi.so.12
>>  (0x00007f1879305000)
>>         libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f18790d2000)
>>         libc.so.6 => /lib64/libc.so.6 (0x00007f1878d3e000)
>>         libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00007f1878b2c000)
>>         libmyriexpress.so => not found
>>         libfabric.so.1 => /opt/libfabric-1.0.0/lib/libfabric.so.1 
>> (0x00007f18788fe000)
>>         libopen-rte.so.12 => 
>> /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib/libopen-rte.so.12
>>  (0x00007f1878565000)
>>         libopen-pal.so.13 => 
>> /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib/libopen-pal.so.13
>>  (0x00007f1878241000)
>>         libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f1878036000)
>>         libdl.so.2 => /lib64/libdl.so.2 (0x00007f1877e32000)
>>         librt.so.1 => /lib64/librt.so.1 (0x00007f1877c29000)
>>         libm.so.6 => /lib64/libm.so.6 (0x00007f18779a5000)
>>         libutil.so.1 => /lib64/libutil.so.1 (0x00007f18777a2000)
>>         /lib64/ld-linux-x86-64.so.2 (0x00007f1879a2a000)
>>         libnl.so.1 => /lib64/libnl.so.1 (0x00007f187754f000)
>> 
>> However, this time it looks like everything is linked correctly:
>> 
>> $ mpicc --show examples/ring_c.c
>> gcc examples/ring_c.c 
>> -I/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/include
>>  -pthread -L/opt/mx2g/lib -L/opt/libfabric-1.0.0/lib -Wl,-rpath 
>> -Wl,/opt/mx2g/lib -Wl,-rpath -Wl,/opt/libfabric-1.0.0/lib -Wl,-rpath 
>> -Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib 
>> -Wl,--enable-new-dtags 
>> -L/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib 
>> -lmpi
>> 
>> $ chrpath --list examples/ring_c
>> examples/ring_c: 
>> RPATH=/opt/mx2g/lib:/opt/libfabric-1.0.0/lib:/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib
>> 
>> 
>> Looking a bit further I find that none of the MPI, OPAL or ORTE libs was 
>> built with the MX libdir in its rpath, though MPI and OPAL have libfabric:
>> 
>> $ chrpath --list INST/lib/libmpi.so      
>> INST/lib/libmpi.so: 
>> RPATH=/opt/libfabric-1.0.0/lib:/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib
>> $ chrpath --list INST/lib/libopen-pal.so 
>> INST/lib/libopen-pal.so: RPATH=::/opt/libfabric-1.1.0/lib
>> $ chrpath --list INST/lib/libopen-rte.so  
>> INST/lib/libopen-rte.so: 
>> RPATH=/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib
>> 
>> 
>> Extracted from the "make V=1" output, here are the (shortened) link commands 
>> for libmpi.so:
>> 
>> /bin/sh ../libtool  --tag=CC   --mode=link gcc -std=gnu99  -g 
>> -finline-functions -fno-strict-aliasing -pthread -version-info 12:0:0   -o 
>> libmpi.la <http://libmpi.la/> -rpath 
>> /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib  
>> [.lo and .la files] -lrt -lm -lutil
>> 
>> libtool: link: gcc -std=gnu99 -shared  -fPIC -DPIC  [.o and .a files] 
>> -Wl,--no-whole-archive  -Wl,-rpath -Wl,/opt/libfabric-1.0.0/lib -Wl,-rpath 
>> -Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/orte/.libs
>>  -Wl,-rpath 
>> -Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/opal/.libs
>>  -Wl,-rpath -Wl,/opt/libfabric-1.0.0/lib -Wl,-rpath 
>> -Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib 
>> -L/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/opal/.libs
>>  -L/opt/mx2g/lib -libverbs -lmyriexpress -L/opt/libfabric-1.0.0/lib 
>> /opt/libfabric-1.0.0/lib/libfabric.so -lpthread 
>> /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/orte/.libs/libopen-rte.so
>>  
>> /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/opal/.libs/libopen-pal.so
>>  -lnuma -ldl -lrt -lm -lutil  -pthread   -pthread -Wl,-soname 
>> -Wl,libmpi.so.12 -o .libs/libmpi.so.12.0.0
>> 
>> The appropriate "-L" and "-l" options are present for libmryiexpress, but 
>> there is no corresponding "-Wl,-rpath, -Wl,...".
>> 
>> In contrast, libfabric gets  "-L" and "-Wl,-rpath, -Wl,...".
>> Curiously, libfabric.so gets linked by full path, instead of "-lfabric".
>> I am not sure if that difference is meaningful or not, but thought I would 
>> mention it just in case it is.
>> 
>> -Paul
>> 
>> -- 
>> Paul H. Hargrove                          phhargr...@lbl.gov 
>> <mailto:phhargr...@lbl.gov>
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department               Tel: +1-510-495-2352 
>> <tel:%2B1-510-495-2352>
>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900 
>> <tel:%2B1-510-486-6900>_______________________________________________
>> devel mailing list
>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/08/17821.php 
>> <http://www.open-mpi.org/community/lists/devel/2015/08/17821.php>
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org <mailto:de...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/08/17822.php 
> <http://www.open-mpi.org/community/lists/devel/2015/08/17822.php>
> 
> 
> 
> -- 
> Paul H. Hargrove                          phhargr...@lbl.gov 
> <mailto:phhargr...@lbl.gov>
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/08/17823.php

Reply via email to