I have now also tried release 3.1.0. Same thing (were I have replaced /nix/store/glx60yay0hmmizhlxhqhnx9w3k4j9g1z-openmpi-3.1.0 with ....)
[orc-login2:107400] mca_base_component_repository_open: unable to open mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so: undefined symbol: mca_common_ompio_file_write (ignored) [orc-login2:107400] mca_base_component_repository_open: unable to open mca_fcoll_dynamic_gen2: .../lib/openmpi/mca_fcoll_dynamic_gen2.so: undefined symbol: mca_common_ompio_register_print_entry (ignored) [orc-login2:107400] mca_base_component_repository_open: unable to open mca_fcoll_dynamic: .../lib/openmpi/mca_fcoll_dynamic.so: undefined symbol: mca_common_ompio_register_print_entry (ignored) [orc-login2:107400] mca_base_component_repository_open: unable to open mca_fcoll_two_phase: .../lib/openmpi/mca_fcoll_two_phase.so: undefined symbol: mca_common_ompio_register_print_entry (ignored) [orc-login2:107400] mca_base_component_repository_open: unable to open mca_fcoll_static: .../lib/openmpi/mca_fcoll_static.so: undefined symbol: mca_common_ompio_register_print_entry (ignored) Package: Open MPI nixbld@localhost Distribution Open MPI: 3.1.0 Open MPI repo revision: v3.1.0 Open MPI release date: May 07, 2018 pppp Open RTE: 3.1.0 Open RTE repo revision: v3.1.0 Open RTE release date: May 07, 2018 OPAL: 3.1.0 OPAL repo revision: v3.1.0 OPAL release date: May 07, 2018 I straced the process, and, as far as I could tell, it was just mostly opening the shared objects in alphabetical order. Would appreciate any insight, such as whether this is normal behaviour I can ignore or not? Thanks! -Tyson On Fri, 8 Jun 2018 at 17:37, Tyson Whitehead <twhiteh...@gmail.com> wrote: > > This email starts out talking about version 1.10.7 to give a complete > picture. I tested 2.1.3 as well, it also exhibits this issue, > although to a lesser extent though, and am asking for help on that > release. > > I was compiling the OpenMPI 1.10.7 shipped with NixOS against a newer > libibverbs with a large set of drivers and get some strange errors > when when running opmi_info (I've replaced the common prefix > /nix/store/9zm0pqsh67fw0xi5cpnybnd7hgzryffs-openmpi-1.10.7 with ...) > > [mon241:04077] mca: base: component_find: unable to open > .../lib/openmpi/mca_btl_openib: .../lib/openmpi/mca_btl_openib.so: > undefined symbol: mca_mpool_grdma_evict (ignored) > [mon241:04077] mca: base: component_find: unable to open > .../lib/openmpi/mca_fcoll_individual: > .../lib/openmpi/mca_fcoll_individual.so: undefined symbol: > mca_io_ompio_file_write (ignored) > [mon241:04077] mca: base: component_find: unable to open > .../lib/openmpi/mca_fcoll_ylib: .../lib/openmpi/mca_fcoll_ylib.so: > undefined symbol: ompi_io_ompio_scatter_data (ignored) > [mon241:04077] mca: base: component_find: unable to open > .../lib/openmpi/mca_fcoll_dynamic: > .../lib/openmpi/mca_fcoll_dynamic.so: undefined symbol: > ompi_io_ompio_allgatherv_array (ignored) > [mon241:04077] mca: base: component_find: unable to open > .../lib/openmpi/mca_fcoll_two_phase: > .../lib/openmpi/mca_fcoll_two_phase.so: undefined symbol: > ompi_io_ompio_set_aggregator_props (ignored) > [mon241:04077] mca: base: component_find: unable to open > .../lib/openmpi/mca_fcoll_static: .../lib/openmpi/mca_fcoll_static.so: > undefined symbol: ompi_io_ompio_allgather_array (ignored) > Package: Open MPI nixbld@ Distribution > Open MPI: 1.10.7 > Open MPI repo revision: v1.10.6-48-g5e373bf > Open MPI release date: May 16, 2017 > Open RTE: 1.10.7 > Open RTE repo revision: v1.10.6-48-g5e373bf > Open RTE release date: May 16, 2017 > OPAL: 1.10.7 > OPAL repo revision: v1.10.6-48-g5e373bf > OPAL release date: May 16, 2017 > ... > > I dug into the first of these (figured out what library provided it, > looked at the declared dependencies, poked around in the automake > file) , and, as far as I could determine, it seems that > mca_btl_openib.so simply isn't linked to list mca_mpool_grdma.so > (which provides the symbol) as a dependency. > > Seeing as 1.10.7 is no longer supported. I figured I would try 2.1.3 > in case this has been fixed. I compiled it up as well, and it seems > all but the mca_fcoll_individual one have been resolved (I've replaced > /nix/store/4kh0zbn8pmdqhvwagicswg70rwnpm570-openmpi-2.1.3 with ...) > > [mon241:05544] mca_base_component_repository_open: unable to open > mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so: > undefined symbol: ompio_io_ompio_file_read (ignored) > Package: Open MPI nixbld@ Distribution > Open MPI: 2.1.3 > Open MPI repo revision: v2.1.2-129-gcfd8f3f > Open MPI release date: Mar 13, 2018 > Open RTE: 2.1.3 > Open RTE repo revision: v2.1.2-129-gcfd8f3f > Open RTE release date: Mar 13, 2018 > OPAL: 2.1.3 > OPAL repo revision: v2.1.2-129-gcfd8f3f > OPAL release date: Mar 13, 2018 > ... > > Again I was able to find this symbol in the mca_io_ompio.so library. > I looked through the source again, and it seems pretty clear that the > function is indeed called, but the library isn't linked to list the > mca_io_ompio.so library as a dependency > > Looking through the various shared libraries in the .../lib/openmpi > directory though, and it seems none of them have dependencies on each > other. How is this suppose to work? Is the component library just > suppose to load everything so all symbols get resolved? Is the above > error I'm seeing an error then? > > Any insight would be appreciated. > > Thanks! -Tyson > > PS: Please note that the openmpi code was compiled without any > patches and without any special configure flags other than > --prefix=.... (NixOS also adds --diasble-static and > --disable-dependency-tracking by default, but I removed those, it > didn't make a difference).. _______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel