Thanks Jeff !

Gilles

"Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote:
>The reasons for this was so that the plugins do not have to be dependent upon 
>a specific version of Open MPI (or, even more restrictive, a specific tuple of 
>Open MPI libraries). This allows third parties to distribute binary plugins 
>that work with multiple versions of open MPI. 
>
>Sent from my phone. No type good. 
>
>
>On Oct 17, 2016, at 9:49 PM, Gilles Gouaillardet <gil...@rist.or.jp> wrote:
>
>Folks,
>
>
>this is a follow up on a question from the users ML
>
>
>is there any reason why plugins do not depend on the main openmpi libs 
>(libopen-pal.so and libopen-rte.so, libompi.so and liboshmem.so if needed) ?
>
>i guess that would solve the issue here without having to use RTLD_GLOBAL.
>
>
>Cheers,
>
>
>Gilles
>
>
>
>-------- Forwarded Message -------- Subject: Re: [OMPI users] Problem with 
>double shared libraryDate: Tue, 18 Oct 2016 10:45:42 +0900From: Gilles 
>Gouaillardet <gil...@rist.or.jp>To: Open MPI Users <us...@lists.open-mpi.org> 
>
>Sean,
>
>
>if i understand correctly, your built a libtransport_mpi.so library that 
>depends on Open MPI, and your main program dlopen libtransport_mpi.so.
>
>in this case, and at least for the time being,  you need to use RTLD_GLOBAL in 
>your dlopen flags.
>
>
>Cheers,
>
>
>Gilles
>
>
>On 10/18/2016 4:53 AM, Sean Ahern wrote:
>
>Folks,
>
>
>For our code, we have a communication layer that abstracts the code that does 
>the actual transfer of data. We call these "transports", and we link them as 
>shared libraries. We have created an MPI transport that compiles/links against 
>OpenMPI 2.0.1 using the compiler wrappers. When I compile OpenMPI with 
>the--disable-dlopen option (thus cramming all of OpenMPI's plugins into the 
>MPI library directly), things work great with our transport shared library. 
>But when I have a "normal" OpenMPI (without --disable-dlopen) and create the 
>same transport shared library, things fail. Upon launch, it appears that 
>OpenMPI is unable to find the appropriate plugins:
>
>
>[hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to open 
>mca_patcher_overwrite: 
>/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_patcher_overwrite.so:
> undefined symbol: mca_patcher_base_patch_t_class (ignored)
>
>[hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to open 
>mca_shmem_mmap: 
>/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_mmap.so:
> undefined symbol: opal_show_help (ignored)
>
>[hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to open 
>mca_shmem_posix: 
>/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_posix.so:
> undefined symbol: opal_show_help (ignored)
>
>[hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to open 
>mca_shmem_sysv: 
>/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_sysv.so:
> undefined symbol: opal_show_help (ignored)
>
>--------------------------------------------------------------------------
>
>It looks like opal_init failed for some reason; your parallel process is
>
>likely to abort.  There are many reasons that a parallel process can
>
>fail during opal_init; some of which are due to configuration or
>
>environment problems.  This failure appears to be an internal failure;
>
>here's some additional information (which may only be relevant to an
>
>Open MPI developer):
>
>
>  opal_shmem_base_select failed
>
>  --> Returned value -1 instead of OPAL_SUCCESS
>
>--------------------------------------------------------------------------
>
>--------------------------------------------------------------------------
>
>It looks like orte_init failed for some reason; your parallel process is
>
>likely to abort.  There are many reasons that a parallel process can
>
>fail during orte_init; some of which are due to configuration or
>
>environment problems.  This failure appears to be an internal failure;
>
>here's some additional information (which may only be relevant to an
>
>Open MPI developer):
>
>
>  opal_init failed
>
>  --> Returned value Error (-1) instead of ORTE_SUCCESS
>
>--------------------------------------------------------------------------
>
>--------------------------------------------------------------------------
>
>It looks like MPI_INIT failed for some reason; your parallel process is
>
>likely to abort.  There are many reasons that a parallel process can
>
>fail during MPI_INIT; some of which are due to configuration or environment
>
>problems.  This failure appears to be an internal failure; here's some
>
>additional information (which may only be relevant to an Open MPI
>
>developer):
>
>
>  ompi_mpi_init: ompi_rte_init failed
>
>  --> Returned "Error" (-1) instead of "Success" (0)
>
>
>If I skip our shared libraries and instead write a standard MPI-based "hello, 
>world" program that links against MPI directly (without --disable-dlopen), 
>everything is again fine.
>
>
>It seems that having the double dlopen is causing problems for OpenMPI finding 
>its own shared libraries.
>
>
>Note: I do have LD_LIBRARY_PATH pointing to …"openmpi-2.0.1/lib", as well as 
>OPAL_PREFIX pointing to …"openmpi-2.0.1".
>
>
>Any thoughts about how I can try to tease out what's going wrong here?
>
>
>-Sean
>
>
>--
>
>Sean Ahern
>
>Computational Engineering International
>
>919-363-0883
>
>
>
>_______________________________________________ users mailing list 
>us...@lists.open-mpi.org 
>https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>
>
>_______________________________________________
>devel mailing list
>devel@lists.open-mpi.org
>https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to