The reasons for this was so that the plugins do not have to be dependent upon a 
specific version of Open MPI (or, even more restrictive, a specific tuple of 
Open MPI libraries). This allows third parties to distribute binary plugins 
that work with multiple versions of open MPI.

Sent from my phone. No type good.

On Oct 17, 2016, at 9:49 PM, Gilles Gouaillardet 
<gil...@rist.or.jp<mailto:gil...@rist.or.jp>> wrote:


Folks,


this is a follow up on a question from the users ML


is there any reason why plugins do not depend on the main openmpi libs 
(libopen-pal.so and libopen-rte.so, libompi.so and liboshmem.so if needed) ?

i guess that would solve the issue here without having to use RTLD_GLOBAL.


Cheers,


Gilles


-------- Forwarded Message --------
Subject:        Re: [OMPI users] Problem with double shared library
Date:   Tue, 18 Oct 2016 10:45:42 +0900
From:   Gilles Gouaillardet <gil...@rist.or.jp><mailto:gil...@rist.or.jp>
To:     Open MPI Users 
<us...@lists.open-mpi.org><mailto:us...@lists.open-mpi.org>



Sean,


if i understand correctly, your built a libtransport_mpi.so library that 
depends on Open MPI, and your main program dlopen libtransport_mpi.so.

in this case, and at least for the time being,  you need to use RTLD_GLOBAL in 
your dlopen flags.


Cheers,


Gilles

On 10/18/2016 4:53 AM, Sean Ahern wrote:
Folks,

For our code, we have a communication layer that abstracts the code that does 
the actual transfer of data. We call these "transports", and we link them as 
shared libraries. We have created an MPI transport that compiles/links against 
OpenMPI 2.0.1 using the compiler wrappers. When I compile OpenMPI with 
the--disable-dlopen option (thus cramming all of OpenMPI's plugins into the MPI 
library directly), things work great with our transport shared library. But 
when I have a "normal" OpenMPI (without --disable-dlopen) and create the same 
transport shared library, things fail. Upon launch, it appears that OpenMPI is 
unable to find the appropriate plugins:

[hyperion.ceintl.com:25595<http://hyperion.ceintl.com:25595>] 
mca_base_component_repository_open: unable to open mca_patcher_overwrite: 
/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_patcher_overwrite.so:
 undefined symbol: mca_patcher_base_patch_t_class (ignored)
[hyperion.ceintl.com:25595<http://hyperion.ceintl.com:25595>] 
mca_base_component_repository_open: unable to open mca_shmem_mmap: 
/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_mmap.so:
 undefined symbol: opal_show_help (ignored)
[hyperion.ceintl.com:25595<http://hyperion.ceintl.com:25595>] 
mca_base_component_repository_open: unable to open mca_shmem_posix: 
/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_posix.so:
 undefined symbol: opal_show_help (ignored)
[hyperion.ceintl.com:25595<http://hyperion.ceintl.com:25595>] 
mca_base_component_repository_open: unable to open mca_shmem_sysv: 
/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_sysv.so:
 undefined symbol: opal_show_help (ignored)
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_init failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "Error" (-1) instead of "Success" (0)

If I skip our shared libraries and instead write a standard MPI-based "hello, 
world" program that links against MPI directly (without --disable-dlopen), 
everything is again fine.

It seems that having the double dlopen is causing problems for OpenMPI finding 
its own shared libraries.

Note: I do have LD_LIBRARY_PATH pointing to ..."openmpi-2.0.1/lib", as well as 
OPAL_PREFIX pointing to ..."openmpi-2.0.1".

Any thoughts about how I can try to tease out what's going wrong here?

-Sean

--
Sean Ahern
Computational Engineering International
919-363-0883



_______________________________________________
users mailing list
us...@lists.open-mpi.org<mailto:us...@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
devel mailing list
devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to