So , I am still surprised to see this error message: if you look at lets say just one error message (and all others are the same):
> > [orc-login2:107400] mca_base_component_repository_open: unable to open > > mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so: > > undefined symbol: mca_common_ompio_file_write (ignored) How comes that the symbol mca_common_ompio_file_write can not be found ? It is in the common, that symbol should always be there, isn't it? Your fix Gilles (which we can discuss) will not address this problem in my opinion. The symbols at this point that are accessed from the ompio component are used through a function pointer, not by name, and that should work in my opinion.(e.g. we do not call directly mca_io_ompio_set_aggregator_props, but we call the function pointer fh->f_set_aggregator_props), and the same with the mca parmaeters, we access them through a function that is stored as a function pointer on the file handle structure. Thanks Edgar > -----Original Message----- > From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of Gilles > Gouaillardet > Sent: Tuesday, June 12, 2018 3:28 AM > To: devel@lists.open-mpi.org > Subject: Re: [OMPI devel] Shared object dependencies > > Tyson, > > > thanks for taking the time to do some more tests. > > > This is really a bug in Open MPI, and unlike what I thought earlier, there are > still > > some abstraction violations here and there related to ompio. > > > I filed https://github.com/open-mpi/ompi/pull/5263 in order to address them > > > Meanwhile, you can configure Open MPI with --disable-dlopen and hopefully, > that will be > > enought to hide the issue. > > > Cheers, > > > Gilles > > > On 6/12/2018 5:58 AM, Tyson Whitehead wrote: > > I have now also tried release 3.1.0. Same thing (were I have replaced > > /nix/store/glx60yay0hmmizhlxhqhnx9w3k4j9g1z-openmpi-3.1.0 with ....) > > > > [orc-login2:107400] mca_base_component_repository_open: unable to open > > mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so: > > undefined symbol: mca_common_ompio_file_write (ignored) > > [orc-login2:107400] mca_base_component_repository_open: unable to open > > mca_fcoll_dynamic_gen2: .../lib/openmpi/mca_fcoll_dynamic_gen2.so: > > undefined symbol: mca_common_ompio_register_print_entry (ignored) > > [orc-login2:107400] mca_base_component_repository_open: unable to open > > mca_fcoll_dynamic: .../lib/openmpi/mca_fcoll_dynamic.so: undefined > > symbol: mca_common_ompio_register_print_entry (ignored) > > [orc-login2:107400] mca_base_component_repository_open: unable to open > > mca_fcoll_two_phase: .../lib/openmpi/mca_fcoll_two_phase.so: undefined > > symbol: mca_common_ompio_register_print_entry (ignored) > > [orc-login2:107400] mca_base_component_repository_open: unable to open > > mca_fcoll_static: .../lib/openmpi/mca_fcoll_static.so: undefined > > symbol: mca_common_ompio_register_print_entry (ignored) > > Package: Open MPI nixbld@localhost Distribution > > Open MPI: 3.1.0 > > Open MPI repo revision: v3.1.0 > > Open MPI release date: May 07, 2018 > > pppp Open RTE: 3.1.0 > > Open RTE repo revision: v3.1.0 > > Open RTE release date: May 07, 2018 > > OPAL: 3.1.0 > > OPAL repo revision: v3.1.0 > > OPAL release date: May 07, 2018 > > > > I straced the process, and, as far as I could tell, it was just mostly > > opening the shared objects in alphabetical order. Would appreciate > > any insight, such as whether this is normal behaviour I can ignore or > > not? > > > > Thanks! -Tyson > > On Fri, 8 Jun 2018 at 17:37, Tyson Whitehead <twhiteh...@gmail.com> > wrote: > >> This email starts out talking about version 1.10.7 to give a complete > >> picture. I tested 2.1.3 as well, it also exhibits this issue, > >> although to a lesser extent though, and am asking for help on that > >> release. > >> > >> I was compiling the OpenMPI 1.10.7 shipped with NixOS against a newer > >> libibverbs with a large set of drivers and get some strange errors > >> when when running opmi_info (I've replaced the common prefix > >> /nix/store/9zm0pqsh67fw0xi5cpnybnd7hgzryffs-openmpi-1.10.7 with ...) > >> > >> [mon241:04077] mca: base: component_find: unable to open > >> .../lib/openmpi/mca_btl_openib: .../lib/openmpi/mca_btl_openib.so: > >> undefined symbol: mca_mpool_grdma_evict (ignored) [mon241:04077] > mca: > >> base: component_find: unable to open > >> .../lib/openmpi/mca_fcoll_individual: > >> .../lib/openmpi/mca_fcoll_individual.so: undefined symbol: > >> mca_io_ompio_file_write (ignored) > >> [mon241:04077] mca: base: component_find: unable to open > >> .../lib/openmpi/mca_fcoll_ylib: .../lib/openmpi/mca_fcoll_ylib.so: > >> undefined symbol: ompi_io_ompio_scatter_data (ignored) [mon241:04077] > >> mca: base: component_find: unable to open > >> .../lib/openmpi/mca_fcoll_dynamic: > >> .../lib/openmpi/mca_fcoll_dynamic.so: undefined symbol: > >> ompi_io_ompio_allgatherv_array (ignored) [mon241:04077] mca: base: > >> component_find: unable to open > >> .../lib/openmpi/mca_fcoll_two_phase: > >> .../lib/openmpi/mca_fcoll_two_phase.so: undefined symbol: > >> ompi_io_ompio_set_aggregator_props (ignored) [mon241:04077] mca: > >> base: component_find: unable to open > >> .../lib/openmpi/mca_fcoll_static: .../lib/openmpi/mca_fcoll_static.so: > >> undefined symbol: ompi_io_ompio_allgather_array (ignored) > >> Package: Open MPI nixbld@ Distribution > >> Open MPI: 1.10.7 > >> Open MPI repo revision: v1.10.6-48-g5e373bf > >> Open MPI release date: May 16, 2017 > >> Open RTE: 1.10.7 > >> Open RTE repo revision: v1.10.6-48-g5e373bf > >> Open RTE release date: May 16, 2017 > >> OPAL: 1.10.7 > >> OPAL repo revision: v1.10.6-48-g5e373bf > >> OPAL release date: May 16, 2017 ... > >> > >> I dug into the first of these (figured out what library provided it, > >> looked at the declared dependencies, poked around in the automake > >> file) , and, as far as I could determine, it seems that > >> mca_btl_openib.so simply isn't linked to list mca_mpool_grdma.so > >> (which provides the symbol) as a dependency. > >> > >> Seeing as 1.10.7 is no longer supported. I figured I would try 2.1.3 > >> in case this has been fixed. I compiled it up as well, and it seems > >> all but the mca_fcoll_individual one have been resolved (I've > >> replaced > >> /nix/store/4kh0zbn8pmdqhvwagicswg70rwnpm570-openmpi-2.1.3 with ...) > >> > >> [mon241:05544] mca_base_component_repository_open: unable to open > >> mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so: > >> undefined symbol: ompio_io_ompio_file_read (ignored) > >> Package: Open MPI nixbld@ Distribution > >> Open MPI: 2.1.3 > >> Open MPI repo revision: v2.1.2-129-gcfd8f3f > >> Open MPI release date: Mar 13, 2018 > >> Open RTE: 2.1.3 > >> Open RTE repo revision: v2.1.2-129-gcfd8f3f > >> Open RTE release date: Mar 13, 2018 > >> OPAL: 2.1.3 > >> OPAL repo revision: v2.1.2-129-gcfd8f3f > >> OPAL release date: Mar 13, 2018 ... > >> > >> Again I was able to find this symbol in the mca_io_ompio.so library. > >> I looked through the source again, and it seems pretty clear that the > >> function is indeed called, but the library isn't linked to list the > >> mca_io_ompio.so library as a dependency > >> > >> Looking through the various shared libraries in the .../lib/openmpi > >> directory though, and it seems none of them have dependencies on each > >> other. How is this suppose to work? Is the component library just > >> suppose to load everything so all symbols get resolved? Is the above > >> error I'm seeing an error then? > >> > >> Any insight would be appreciated. > >> > >> Thanks! -Tyson > >> > >> PS: Please note that the openmpi code was compiled without any > >> patches and without any special configure flags other than > >> --prefix=.... (NixOS also adds --diasble-static and > >> --disable-dependency-tracking by default, but I removed those, it > >> didn't make a difference).. > > _______________________________________________ > > devel mailing list > > devel@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/devel > > > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel _______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel