I wanted to add one item before I forget (although I agree with what Jeff 
said): The error messages shown reminds me of the problem that we had with 
ompio  in 1.8/1.10 series when the RTLD_GLOBAL  option was not correctly set. 
However, that was fixed in the 2.0 series and going forward, so if that shows 
up with later releases, it might an indication of something else.

Edgar 

> -----Original Message-----
> From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of Jeff
> Squyres (jsquyres) via devel
> Sent: Friday, June 8, 2018 4:54 PM
> To: Open MPI Developers List <devel@lists.open-mpi.org>
> Cc: Jeff Squyres (jsquyres) <jsquy...@cisco.com>
> Subject: Re: [OMPI devel] Shared object dependencies
> 
> Before digging any deeper, did you perchance install multiple versions of Open
> MPI into the same prefix?
> 
> If so, remember that Open MPI installs lots of plugins.  The exact set of 
> plugins
> changes every release.  So if you install version A.B.C in to /opt/openmpi, 
> and
> then install version X.Y.Z in to /opt/openmpi, note that the installation of 
> X.Y.Z
> did not *uninstall* A.B.C first.  Hence, you might still have some stale A.B.C
> components in the tree that Open MPI X.Y.Z may try to open.  Since the
> underlying libraries that these plugins use have now been upgraded to X.Y.Z,
> the stale A.B.C component may (and likely will) fail to open.
> 
> If that's not what is happening, let us know and we can dig deeper.
> 
> 
> > On Jun 8, 2018, at 5:37 PM, Tyson Whitehead <twhiteh...@gmail.com>
> wrote:
> >
> > This email starts out talking about version 1.10.7 to give a complete
> > picture.  I tested 2.1.3 as well, it also exhibits this issue,
> > although to a lesser extent though, and am asking for help on that
> > release.
> >
> > I was compiling the OpenMPI 1.10.7 shipped with NixOS against a newer
> > libibverbs with a large set of drivers and get some strange errors
> > when when running opmi_info (I've replaced the common prefix
> > /nix/store/9zm0pqsh67fw0xi5cpnybnd7hgzryffs-openmpi-1.10.7 with ...)
> >
> > [mon241:04077] mca: base: component_find: unable to open
> > .../lib/openmpi/mca_btl_openib: .../lib/openmpi/mca_btl_openib.so:
> > undefined symbol: mca_mpool_grdma_evict (ignored) [mon241:04077] mca:
> > base: component_find: unable to open
> > .../lib/openmpi/mca_fcoll_individual:
> > .../lib/openmpi/mca_fcoll_individual.so: undefined symbol:
> > mca_io_ompio_file_write (ignored)
> > [mon241:04077] mca: base: component_find: unable to open
> > .../lib/openmpi/mca_fcoll_ylib: .../lib/openmpi/mca_fcoll_ylib.so:
> > undefined symbol: ompi_io_ompio_scatter_data (ignored) [mon241:04077]
> > mca: base: component_find: unable to open
> > .../lib/openmpi/mca_fcoll_dynamic:
> > .../lib/openmpi/mca_fcoll_dynamic.so: undefined symbol:
> > ompi_io_ompio_allgatherv_array (ignored) [mon241:04077] mca: base:
> > component_find: unable to open
> > .../lib/openmpi/mca_fcoll_two_phase:
> > .../lib/openmpi/mca_fcoll_two_phase.so: undefined symbol:
> > ompi_io_ompio_set_aggregator_props (ignored) [mon241:04077] mca: base:
> > component_find: unable to open
> > .../lib/openmpi/mca_fcoll_static: .../lib/openmpi/mca_fcoll_static.so:
> > undefined symbol: ompi_io_ompio_allgather_array (ignored)
> >                 Package: Open MPI nixbld@ Distribution
> >               Open MPI: 1.10.7
> > Open MPI repo revision: v1.10.6-48-g5e373bf  Open MPI release date:
> > May 16, 2017
> >               Open RTE: 1.10.7
> > Open RTE repo revision: v1.10.6-48-g5e373bf  Open RTE release date:
> > May 16, 2017
> >                   OPAL: 1.10.7
> >     OPAL repo revision: v1.10.6-48-g5e373bf
> >      OPAL release date: May 16, 2017
> > ...
> >
> > I dug into the first of these (figured out what library provided it,
> > looked at the declared dependencies, poked around in the automake
> > file) , and, as far as I could determine, it seems that
> > mca_btl_openib.so simply isn't linked to list mca_mpool_grdma.so
> > (which provides the symbol) as a dependency.
> >
> > Seeing as 1.10.7 is no longer supported.  I figured I would try 2.1.3
> > in case this has been fixed.  I compiled it up as well, and it seems
> > all but the mca_fcoll_individual one have been resolved (I've replaced
> > /nix/store/4kh0zbn8pmdqhvwagicswg70rwnpm570-openmpi-2.1.3 with ...)
> >
> > [mon241:05544] mca_base_component_repository_open: unable to open
> > mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so:
> > undefined symbol: ompio_io_ompio_file_read (ignored)
> >                 Package: Open MPI nixbld@ Distribution
> >               Open MPI: 2.1.3
> > Open MPI repo revision: v2.1.2-129-gcfd8f3f  Open MPI release date:
> > Mar 13, 2018
> >               Open RTE: 2.1.3
> > Open RTE repo revision: v2.1.2-129-gcfd8f3f  Open RTE release date:
> > Mar 13, 2018
> >                   OPAL: 2.1.3
> >     OPAL repo revision: v2.1.2-129-gcfd8f3f
> >      OPAL release date: Mar 13, 2018
> > ...
> >
> > Again I was able to find this symbol in the mca_io_ompio.so library.
> > I looked through the source again, and it seems pretty clear that the
> > function is indeed called, but the library isn't linked to list the
> > mca_io_ompio.so library as a dependency
> >
> > Looking through the various shared libraries in the .../lib/openmpi
> > directory though, and it seems none of them have dependencies on each
> > other.  How is this suppose to work?  Is the component library just
> > suppose to load everything so all symbols get resolved?  Is the above
> > error I'm seeing an error then?
> >
> > Any insight would be appreciated.
> >
> > Thanks!  -Tyson
> >
> > PS:  Please note that the openmpi code was compiled without any
> > patches and without any special configure flags other than
> > --prefix=.... (NixOS also adds --diasble-static and
> > --disable-dependency-tracking by default, but I removed those, it
> > didn't make a difference)..
> > _______________________________________________
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/devel
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> 
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to