Thanks Jeff ! Gilles
"Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote: >The reasons for this was so that the plugins do not have to be dependent upon >a specific version of Open MPI (or, even more restrictive, a specific tuple of >Open MPI libraries). This allows third parties to distribute binary plugins >that work with multiple versions of open MPI. > >Sent from my phone. No type good. > > >On Oct 17, 2016, at 9:49 PM, Gilles Gouaillardet <gil...@rist.or.jp> wrote: > >Folks, > > >this is a follow up on a question from the users ML > > >is there any reason why plugins do not depend on the main openmpi libs >(libopen-pal.so and libopen-rte.so, libompi.so and liboshmem.so if needed) ? > >i guess that would solve the issue here without having to use RTLD_GLOBAL. > > >Cheers, > > >Gilles > > > >-------- Forwarded Message -------- Subject: Re: [OMPI users] Problem with >double shared libraryDate: Tue, 18 Oct 2016 10:45:42 +0900From: Gilles >Gouaillardet <gil...@rist.or.jp>To: Open MPI Users <us...@lists.open-mpi.org> > >Sean, > > >if i understand correctly, your built a libtransport_mpi.so library that >depends on Open MPI, and your main program dlopen libtransport_mpi.so. > >in this case, and at least for the time being, you need to use RTLD_GLOBAL in >your dlopen flags. > > >Cheers, > > >Gilles > > >On 10/18/2016 4:53 AM, Sean Ahern wrote: > >Folks, > > >For our code, we have a communication layer that abstracts the code that does >the actual transfer of data. We call these "transports", and we link them as >shared libraries. We have created an MPI transport that compiles/links against >OpenMPI 2.0.1 using the compiler wrappers. When I compile OpenMPI with >the--disable-dlopen option (thus cramming all of OpenMPI's plugins into the >MPI library directly), things work great with our transport shared library. >But when I have a "normal" OpenMPI (without --disable-dlopen) and create the >same transport shared library, things fail. Upon launch, it appears that >OpenMPI is unable to find the appropriate plugins: > > >[hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to open >mca_patcher_overwrite: >/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_patcher_overwrite.so: > undefined symbol: mca_patcher_base_patch_t_class (ignored) > >[hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to open >mca_shmem_mmap: >/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_mmap.so: > undefined symbol: opal_show_help (ignored) > >[hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to open >mca_shmem_posix: >/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_posix.so: > undefined symbol: opal_show_help (ignored) > >[hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to open >mca_shmem_sysv: >/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_sysv.so: > undefined symbol: opal_show_help (ignored) > >-------------------------------------------------------------------------- > >It looks like opal_init failed for some reason; your parallel process is > >likely to abort. There are many reasons that a parallel process can > >fail during opal_init; some of which are due to configuration or > >environment problems. This failure appears to be an internal failure; > >here's some additional information (which may only be relevant to an > >Open MPI developer): > > > opal_shmem_base_select failed > > --> Returned value -1 instead of OPAL_SUCCESS > >-------------------------------------------------------------------------- > >-------------------------------------------------------------------------- > >It looks like orte_init failed for some reason; your parallel process is > >likely to abort. There are many reasons that a parallel process can > >fail during orte_init; some of which are due to configuration or > >environment problems. This failure appears to be an internal failure; > >here's some additional information (which may only be relevant to an > >Open MPI developer): > > > opal_init failed > > --> Returned value Error (-1) instead of ORTE_SUCCESS > >-------------------------------------------------------------------------- > >-------------------------------------------------------------------------- > >It looks like MPI_INIT failed for some reason; your parallel process is > >likely to abort. There are many reasons that a parallel process can > >fail during MPI_INIT; some of which are due to configuration or environment > >problems. This failure appears to be an internal failure; here's some > >additional information (which may only be relevant to an Open MPI > >developer): > > > ompi_mpi_init: ompi_rte_init failed > > --> Returned "Error" (-1) instead of "Success" (0) > > >If I skip our shared libraries and instead write a standard MPI-based "hello, >world" program that links against MPI directly (without --disable-dlopen), >everything is again fine. > > >It seems that having the double dlopen is causing problems for OpenMPI finding >its own shared libraries. > > >Note: I do have LD_LIBRARY_PATH pointing to …"openmpi-2.0.1/lib", as well as >OPAL_PREFIX pointing to …"openmpi-2.0.1". > > >Any thoughts about how I can try to tease out what's going wrong here? > > >-Sean > > >-- > >Sean Ahern > >Computational Engineering International > >919-363-0883 > > > >_______________________________________________ users mailing list >us...@lists.open-mpi.org >https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > >_______________________________________________ >devel mailing list >devel@lists.open-mpi.org >https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel