I think this also depends on the linker (configuration ?) and possibly the order the libraries are dlopen’ed.
Note the issue was initially reported (as warnings only) from ompi_info, so there is a possibility it we all missed it. That being said, the errors make perfect sense to me. fwiw, I installed a NixOS virtual machine and reproduced the issue right away. Cheers, Gilles On Tuesday, June 12, 2018, Gabriel, Edgar <egabr...@central.uh.edu> wrote: > No, I do not use -disable-dlopen, this is the other thing that is > confusing to me, how comes this error does not occur for anybody else. > Thanks > Edgar > > > -----Original Message----- > > From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of Jeff > > Squyres (jsquyres) via devel > > Sent: Tuesday, June 12, 2018 9:11 AM > > To: Open MPI Developers List <devel@lists.open-mpi.org> > > Cc: Jeff Squyres (jsquyres) <jsquy...@cisco.com> > > Subject: Re: [OMPI devel] Shared object dependencies > > > > How is it that Edgar is not running into these issues? > > > > Edgar: are you compiling with --disable-dlopen, perchance? > > > > > > > On Jun 12, 2018, at 6:04 AM, Gilles Gouaillardet > > <gilles.gouaillar...@gmail.com> wrote: > > > > > > Edgar, > > > > > > Regarding this specific problem, the issue is mca_fcoll_individual.so > > > did not depend on libmca_commom_ompio.so, the PR does address that > > > (among other abstraction violations) > > > > > > What about following up in github ? > > > > > > Cheers, > > > > > > Gilles > > > > > > On Tuesday, June 12, 2018, Gabriel, Edgar <egabr...@central.uh.edu> > > wrote: > > > So , I am still surprised to see this error message: if you look at > lets say just > > one error message (and all others are the same): > > > > > > > > [orc-login2:107400] mca_base_component_repository_open: unable to > > > > > open > > > > > mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so: > > > > > undefined symbol: mca_common_ompio_file_write (ignored) > > > > > > How comes that the symbol mca_common_ompio_file_write can not be > > found ? It is in the common, that symbol should always be there, isn't > it? > > > Your fix Gilles (which we can discuss) will not address this problem > in my > > opinion. The symbols at this point that are accessed from the ompio > > component are used through a function pointer, not by name, and that > > should work in my opinion.(e.g. we do not call directly > > mca_io_ompio_set_aggregator_props, but we call the function pointer fh- > > >f_set_aggregator_props), and the same with the mca parmaeters, we access > > them through a function that is stored as a function pointer on the file > > handle structure. > > > > > > Thanks > > > Edgar > > > > > > > > > > -----Original Message----- > > > > From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of > > > > Gilles Gouaillardet > > > > Sent: Tuesday, June 12, 2018 3:28 AM > > > > To: devel@lists.open-mpi.org > > > > Subject: Re: [OMPI devel] Shared object dependencies > > > > > > > > Tyson, > > > > > > > > > > > > thanks for taking the time to do some more tests. > > > > > > > > > > > > This is really a bug in Open MPI, and unlike what I thought earlier, > > > > there are still > > > > > > > > some abstraction violations here and there related to ompio. > > > > > > > > > > > > I filed https://github.com/open-mpi/ompi/pull/5263 in order to > > > > address them > > > > > > > > > > > > Meanwhile, you can configure Open MPI with --disable-dlopen and > > > > hopefully, that will be > > > > > > > > enought to hide the issue. > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Gilles > > > > > > > > > > > > On 6/12/2018 5:58 AM, Tyson Whitehead wrote: > > > > > I have now also tried release 3.1.0. Same thing (were I have > > > > > replaced > > > > > /nix/store/glx60yay0hmmizhlxhqhnx9w3k4j9g1z-openmpi-3.1.0 with > > > > > ....) > > > > > > > > > > [orc-login2:107400] mca_base_component_repository_open: unable to > > > > > open > > > > > mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so: > > > > > undefined symbol: mca_common_ompio_file_write (ignored) > > > > > [orc-login2:107400] mca_base_component_repository_open: unable to > > > > > open > > > > > mca_fcoll_dynamic_gen2: .../lib/openmpi/mca_fcoll_dynamic_gen2.so: > > > > > undefined symbol: mca_common_ompio_register_print_entry (ignored) > > > > > [orc-login2:107400] mca_base_component_repository_open: unable to > > > > > open > > > > > mca_fcoll_dynamic: .../lib/openmpi/mca_fcoll_dynamic.so: undefined > > > > > symbol: mca_common_ompio_register_print_entry (ignored) > > > > > [orc-login2:107400] mca_base_component_repository_open: unable to > > > > > open > > > > > mca_fcoll_two_phase: .../lib/openmpi/mca_fcoll_two_phase.so: > > > > > undefined > > > > > symbol: mca_common_ompio_register_print_entry (ignored) > > > > > [orc-login2:107400] mca_base_component_repository_open: unable to > > > > > open > > > > > mca_fcoll_static: .../lib/openmpi/mca_fcoll_static.so: undefined > > > > > symbol: mca_common_ompio_register_print_entry (ignored) > > > > > Package: Open MPI nixbld@localhost Distribution > > > > > Open MPI: 3.1.0 > > > > > Open MPI repo revision: v3.1.0 > > > > > Open MPI release date: May 07, 2018 > > > > > pppp Open RTE: 3.1.0 > > > > > Open RTE repo revision: v3.1.0 > > > > > Open RTE release date: May 07, 2018 > > > > > OPAL: 3.1.0 > > > > > OPAL repo revision: v3.1.0 > > > > > OPAL release date: May 07, 2018 > > > > > > > > > > I straced the process, and, as far as I could tell, it was just > > > > > mostly opening the shared objects in alphabetical order. Would > > > > > appreciate any insight, such as whether this is normal behaviour I > > > > > can ignore or not? > > > > > > > > > > Thanks! -Tyson > > > > > On Fri, 8 Jun 2018 at 17:37, Tyson Whitehead > > > > > <twhiteh...@gmail.com> > > > > wrote: > > > > >> This email starts out talking about version 1.10.7 to give a > > > > >> complete picture. I tested 2.1.3 as well, it also exhibits this > > > > >> issue, although to a lesser extent though, and am asking for help > > > > >> on that release. > > > > >> > > > > >> I was compiling the OpenMPI 1.10.7 shipped with NixOS against a > > > > >> newer libibverbs with a large set of drivers and get some strange > > > > >> errors when when running opmi_info (I've replaced the common > > > > >> prefix > > > > >> /nix/store/9zm0pqsh67fw0xi5cpnybnd7hgzryffs-openmpi-1.10.7 with > > > > >> ...) > > > > >> > > > > >> [mon241:04077] mca: base: component_find: unable to open > > > > >> .../lib/openmpi/mca_btl_openib: .../lib/openmpi/mca_btl_ > openib.so: > > > > >> undefined symbol: mca_mpool_grdma_evict (ignored) > > [mon241:04077] > > > > mca: > > > > >> base: component_find: unable to open > > > > >> .../lib/openmpi/mca_fcoll_individual: > > > > >> .../lib/openmpi/mca_fcoll_individual.so: undefined symbol: > > > > >> mca_io_ompio_file_write (ignored) [mon241:04077] mca: base: > > > > >> component_find: unable to open > > > > >> .../lib/openmpi/mca_fcoll_ylib: .../lib/openmpi/mca_fcoll_ > ylib.so: > > > > >> undefined symbol: ompi_io_ompio_scatter_data (ignored) > > > > >> [mon241:04077] > > > > >> mca: base: component_find: unable to open > > > > >> .../lib/openmpi/mca_fcoll_dynamic: > > > > >> .../lib/openmpi/mca_fcoll_dynamic.so: undefined symbol: > > > > >> ompi_io_ompio_allgatherv_array (ignored) [mon241:04077] mca: > > base: > > > > >> component_find: unable to open > > > > >> .../lib/openmpi/mca_fcoll_two_phase: > > > > >> .../lib/openmpi/mca_fcoll_two_phase.so: undefined symbol: > > > > >> ompi_io_ompio_set_aggregator_props (ignored) [mon241:04077] > > mca: > > > > >> base: component_find: unable to open > > > > >> .../lib/openmpi/mca_fcoll_static: .../lib/openmpi/mca_fcoll_ > static.so: > > > > >> undefined symbol: ompi_io_ompio_allgather_array (ignored) > > > > >> Package: Open MPI nixbld@ Distribution > > > > >> Open MPI: 1.10.7 > > > > >> Open MPI repo revision: v1.10.6-48-g5e373bf > > > > >> Open MPI release date: May 16, 2017 > > > > >> Open RTE: 1.10.7 > > > > >> Open RTE repo revision: v1.10.6-48-g5e373bf > > > > >> Open RTE release date: May 16, 2017 > > > > >> OPAL: 1.10.7 > > > > >> OPAL repo revision: v1.10.6-48-g5e373bf > > > > >> OPAL release date: May 16, 2017 ... > > > > >> > > > > >> I dug into the first of these (figured out what library provided > > > > >> it, looked at the declared dependencies, poked around in the > > > > >> automake > > > > >> file) , and, as far as I could determine, it seems that > > > > >> mca_btl_openib.so simply isn't linked to list mca_mpool_grdma.so > > > > >> (which provides the symbol) as a dependency. > > > > >> > > > > >> Seeing as 1.10.7 is no longer supported. I figured I would try > > > > >> 2.1.3 in case this has been fixed. I compiled it up as well, and > > > > >> it seems all but the mca_fcoll_individual one have been resolved > > > > >> (I've replaced > > > > >> /nix/store/4kh0zbn8pmdqhvwagicswg70rwnpm570-openmpi-2.1.3 > > with > > > > >> ...) > > > > >> > > > > >> [mon241:05544] mca_base_component_repository_open: unable to > > open > > > > >> mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so: > > > > >> undefined symbol: ompio_io_ompio_file_read (ignored) > > > > >> Package: Open MPI nixbld@ Distribution > > > > >> Open MPI: 2.1.3 > > > > >> Open MPI repo revision: v2.1.2-129-gcfd8f3f > > > > >> Open MPI release date: Mar 13, 2018 > > > > >> Open RTE: 2.1.3 > > > > >> Open RTE repo revision: v2.1.2-129-gcfd8f3f > > > > >> Open RTE release date: Mar 13, 2018 > > > > >> OPAL: 2.1.3 > > > > >> OPAL repo revision: v2.1.2-129-gcfd8f3f > > > > >> OPAL release date: Mar 13, 2018 ... > > > > >> > > > > >> Again I was able to find this symbol in the mca_io_ompio.so > library. > > > > >> I looked through the source again, and it seems pretty clear that > > > > >> the function is indeed called, but the library isn't linked to > > > > >> list the mca_io_ompio.so library as a dependency > > > > >> > > > > >> Looking through the various shared libraries in the > > > > >> .../lib/openmpi directory though, and it seems none of them have > > > > >> dependencies on each other. How is this suppose to work? Is the > > > > >> component library just suppose to load everything so all symbols > > > > >> get resolved? Is the above error I'm seeing an error then? > > > > >> > > > > >> Any insight would be appreciated. > > > > >> > > > > >> Thanks! -Tyson > > > > >> > > > > >> PS: Please note that the openmpi code was compiled without any > > > > >> patches and without any special configure flags other than > > > > >> --prefix=.... (NixOS also adds --diasble-static and > > > > >> --disable-dependency-tracking by default, but I removed those, it > > > > >> didn't make a difference).. > > > > > _______________________________________________ > > > > > devel mailing list > > > > > devel@lists.open-mpi.org > > > > > https://lists.open-mpi.org/mailman/listinfo/devel > > > > > > > > > > > > > _______________________________________________ > > > > devel mailing list > > > > devel@lists.open-mpi.org > > > > https://lists.open-mpi.org/mailman/listinfo/devel > > > _______________________________________________ > > > devel mailing list > > > devel@lists.open-mpi.org > > > https://lists.open-mpi.org/mailman/listinfo/devel > > > _______________________________________________ > > > devel mailing list > > > devel@lists.open-mpi.org > > > https://lists.open-mpi.org/mailman/listinfo/devel > > > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > > > _______________________________________________ > > devel mailing list > > devel@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/devel > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel >
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel