Edgar, I checked the various release branches, and I think this issue was fixed by https://github.com/open-mpi/ompi/commit/ccf76b779130e065de326f71fe6bac868c565300
This was back-ported into the v3.0.x branch, and that was before the v3.1.x branch was created. This has *not* been backported into the v2.x series, and as far as I am concerned, that would fix the abstraction violation I mentioned earlier. I noted the fcoll framework is open is mca_io_base_file_select(), so an other (a bit convoluted imho, but that could require less changes) way could be to open the framework in the io/ompio component. Cheers, Gilles On Sat, Jun 9, 2018 at 7:59 AM Gabriel, Edgar <egabr...@central.uh.edu> wrote: > > I wanted to add one item before I forget (although I agree with what Jeff > said): The error messages shown reminds me of the problem that we had with > ompio in 1.8/1.10 series when the RTLD_GLOBAL option was not correctly set. > However, that was fixed in the 2.0 series and going forward, so if that shows > up with later releases, it might an indication of something else. > > Edgar > > > -----Original Message----- > > From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of Jeff > > Squyres (jsquyres) via devel > > Sent: Friday, June 8, 2018 4:54 PM > > To: Open MPI Developers List <devel@lists.open-mpi.org> > > Cc: Jeff Squyres (jsquyres) <jsquy...@cisco.com> > > Subject: Re: [OMPI devel] Shared object dependencies > > > > Before digging any deeper, did you perchance install multiple versions of > > Open > > MPI into the same prefix? > > > > If so, remember that Open MPI installs lots of plugins. The exact set of > > plugins > > changes every release. So if you install version A.B.C in to /opt/openmpi, > > and > > then install version X.Y.Z in to /opt/openmpi, note that the installation > > of X.Y.Z > > did not *uninstall* A.B.C first. Hence, you might still have some stale > > A.B.C > > components in the tree that Open MPI X.Y.Z may try to open. Since the > > underlying libraries that these plugins use have now been upgraded to X.Y.Z, > > the stale A.B.C component may (and likely will) fail to open. > > > > If that's not what is happening, let us know and we can dig deeper. > > > > > > > On Jun 8, 2018, at 5:37 PM, Tyson Whitehead <twhiteh...@gmail.com> > > wrote: > > > > > > This email starts out talking about version 1.10.7 to give a complete > > > picture. I tested 2.1.3 as well, it also exhibits this issue, > > > although to a lesser extent though, and am asking for help on that > > > release. > > > > > > I was compiling the OpenMPI 1.10.7 shipped with NixOS against a newer > > > libibverbs with a large set of drivers and get some strange errors > > > when when running opmi_info (I've replaced the common prefix > > > /nix/store/9zm0pqsh67fw0xi5cpnybnd7hgzryffs-openmpi-1.10.7 with ...) > > > > > > [mon241:04077] mca: base: component_find: unable to open > > > .../lib/openmpi/mca_btl_openib: .../lib/openmpi/mca_btl_openib.so: > > > undefined symbol: mca_mpool_grdma_evict (ignored) [mon241:04077] mca: > > > base: component_find: unable to open > > > .../lib/openmpi/mca_fcoll_individual: > > > .../lib/openmpi/mca_fcoll_individual.so: undefined symbol: > > > mca_io_ompio_file_write (ignored) > > > [mon241:04077] mca: base: component_find: unable to open > > > .../lib/openmpi/mca_fcoll_ylib: .../lib/openmpi/mca_fcoll_ylib.so: > > > undefined symbol: ompi_io_ompio_scatter_data (ignored) [mon241:04077] > > > mca: base: component_find: unable to open > > > .../lib/openmpi/mca_fcoll_dynamic: > > > .../lib/openmpi/mca_fcoll_dynamic.so: undefined symbol: > > > ompi_io_ompio_allgatherv_array (ignored) [mon241:04077] mca: base: > > > component_find: unable to open > > > .../lib/openmpi/mca_fcoll_two_phase: > > > .../lib/openmpi/mca_fcoll_two_phase.so: undefined symbol: > > > ompi_io_ompio_set_aggregator_props (ignored) [mon241:04077] mca: base: > > > component_find: unable to open > > > .../lib/openmpi/mca_fcoll_static: .../lib/openmpi/mca_fcoll_static.so: > > > undefined symbol: ompi_io_ompio_allgather_array (ignored) > > > Package: Open MPI nixbld@ Distribution > > > Open MPI: 1.10.7 > > > Open MPI repo revision: v1.10.6-48-g5e373bf Open MPI release date: > > > May 16, 2017 > > > Open RTE: 1.10.7 > > > Open RTE repo revision: v1.10.6-48-g5e373bf Open RTE release date: > > > May 16, 2017 > > > OPAL: 1.10.7 > > > OPAL repo revision: v1.10.6-48-g5e373bf > > > OPAL release date: May 16, 2017 > > > ... > > > > > > I dug into the first of these (figured out what library provided it, > > > looked at the declared dependencies, poked around in the automake > > > file) , and, as far as I could determine, it seems that > > > mca_btl_openib.so simply isn't linked to list mca_mpool_grdma.so > > > (which provides the symbol) as a dependency. > > > > > > Seeing as 1.10.7 is no longer supported. I figured I would try 2.1.3 > > > in case this has been fixed. I compiled it up as well, and it seems > > > all but the mca_fcoll_individual one have been resolved (I've replaced > > > /nix/store/4kh0zbn8pmdqhvwagicswg70rwnpm570-openmpi-2.1.3 with ...) > > > > > > [mon241:05544] mca_base_component_repository_open: unable to open > > > mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so: > > > undefined symbol: ompio_io_ompio_file_read (ignored) > > > Package: Open MPI nixbld@ Distribution > > > Open MPI: 2.1.3 > > > Open MPI repo revision: v2.1.2-129-gcfd8f3f Open MPI release date: > > > Mar 13, 2018 > > > Open RTE: 2.1.3 > > > Open RTE repo revision: v2.1.2-129-gcfd8f3f Open RTE release date: > > > Mar 13, 2018 > > > OPAL: 2.1.3 > > > OPAL repo revision: v2.1.2-129-gcfd8f3f > > > OPAL release date: Mar 13, 2018 > > > ... > > > > > > Again I was able to find this symbol in the mca_io_ompio.so library. > > > I looked through the source again, and it seems pretty clear that the > > > function is indeed called, but the library isn't linked to list the > > > mca_io_ompio.so library as a dependency > > > > > > Looking through the various shared libraries in the .../lib/openmpi > > > directory though, and it seems none of them have dependencies on each > > > other. How is this suppose to work? Is the component library just > > > suppose to load everything so all symbols get resolved? Is the above > > > error I'm seeing an error then? > > > > > > Any insight would be appreciated. > > > > > > Thanks! -Tyson > > > > > > PS: Please note that the openmpi code was compiled without any > > > patches and without any special configure flags other than > > > --prefix=.... (NixOS also adds --diasble-static and > > > --disable-dependency-tracking by default, but I removed those, it > > > didn't make a difference).. > > > _______________________________________________ > > > devel mailing list > > > devel@lists.open-mpi.org > > > https://lists.open-mpi.org/mailman/listinfo/devel > > > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > > > _______________________________________________ > > devel mailing list > > devel@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/devel > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel _______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel