How is it that Edgar is not running into these issues?

Edgar: are you compiling with --disable-dlopen, perchance?


> On Jun 12, 2018, at 6:04 AM, Gilles Gouaillardet 
> <gilles.gouaillar...@gmail.com> wrote:
> 
> Edgar,
> 
> Regarding this specific problem, the issue is mca_fcoll_individual.so did not 
> depend on libmca_commom_ompio.so,
> the PR does address that (among other abstraction violations)
> 
> What about following up in github  ?
> 
> Cheers,
> 
> Gilles
> 
> On Tuesday, June 12, 2018, Gabriel, Edgar <egabr...@central.uh.edu> wrote:
> So , I am still surprised to see this error message: if you look at lets say 
> just one error message (and all others are the same):
> 
> > > [orc-login2:107400] mca_base_component_repository_open: unable to open
> > > mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so:
> > > undefined symbol: mca_common_ompio_file_write (ignored)
> 
> How comes that the symbol mca_common_ompio_file_write can not be found ? It 
> is in the common, that symbol should always be there, isn't it? 
> Your fix Gilles (which we can discuss) will not address this problem in my 
> opinion. The symbols at this point that are accessed from the ompio component 
> are used through a function pointer, not by name, and that should work in my 
> opinion.(e.g. we do not call directly mca_io_ompio_set_aggregator_props, but 
> we call the function pointer fh->f_set_aggregator_props), and the same with 
> the mca parmaeters, we access them through a function that is stored as a 
> function pointer on the file handle structure.
> 
> Thanks
> Edgar
>  
> 
> > -----Original Message-----
> > From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of Gilles
> > Gouaillardet
> > Sent: Tuesday, June 12, 2018 3:28 AM
> > To: devel@lists.open-mpi.org
> > Subject: Re: [OMPI devel] Shared object dependencies
> > 
> > Tyson,
> > 
> > 
> > thanks for taking the time to do some more tests.
> > 
> > 
> > This is really a bug in Open MPI, and unlike what I thought earlier, there 
> > are
> > still
> > 
> > some abstraction violations here and there related to ompio.
> > 
> > 
> > I filed https://github.com/open-mpi/ompi/pull/5263 in order to address them
> > 
> > 
> > Meanwhile, you can configure Open MPI with --disable-dlopen and hopefully,
> > that will be
> > 
> > enought to hide the issue.
> > 
> > 
> > Cheers,
> > 
> > 
> > Gilles
> > 
> > 
> > On 6/12/2018 5:58 AM, Tyson Whitehead wrote:
> > > I have now also tried release 3.1.0.  Same thing (were I have replaced
> > > /nix/store/glx60yay0hmmizhlxhqhnx9w3k4j9g1z-openmpi-3.1.0 with ....)
> > >
> > > [orc-login2:107400] mca_base_component_repository_open: unable to open
> > > mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so:
> > > undefined symbol: mca_common_ompio_file_write (ignored)
> > > [orc-login2:107400] mca_base_component_repository_open: unable to open
> > > mca_fcoll_dynamic_gen2: .../lib/openmpi/mca_fcoll_dynamic_gen2.so:
> > > undefined symbol: mca_common_ompio_register_print_entry (ignored)
> > > [orc-login2:107400] mca_base_component_repository_open: unable to open
> > > mca_fcoll_dynamic: .../lib/openmpi/mca_fcoll_dynamic.so: undefined
> > > symbol: mca_common_ompio_register_print_entry (ignored)
> > > [orc-login2:107400] mca_base_component_repository_open: unable to open
> > > mca_fcoll_two_phase: .../lib/openmpi/mca_fcoll_two_phase.so: undefined
> > > symbol: mca_common_ompio_register_print_entry (ignored)
> > > [orc-login2:107400] mca_base_component_repository_open: unable to open
> > > mca_fcoll_static: .../lib/openmpi/mca_fcoll_static.so: undefined
> > > symbol: mca_common_ompio_register_print_entry (ignored)
> > >                   Package: Open MPI nixbld@localhost Distribution
> > >                  Open MPI: 3.1.0
> > >    Open MPI repo revision: v3.1.0
> > >     Open MPI release date: May 07, 2018
> > >   pppp               Open RTE: 3.1.0
> > >    Open RTE repo revision: v3.1.0
> > >     Open RTE release date: May 07, 2018
> > >                      OPAL: 3.1.0
> > >         OPAL repo revision: v3.1.0
> > >         OPAL release date: May 07, 2018
> > >
> > > I straced the process, and, as far as I could tell, it was just mostly
> > > opening the shared objects in alphabetical order.  Would appreciate
> > > any insight, such as whether this is normal behaviour I can ignore or
> > > not?
> > >
> > > Thanks!  -Tyson
> > > On Fri, 8 Jun 2018 at 17:37, Tyson Whitehead <twhiteh...@gmail.com>
> > wrote:
> > >> This email starts out talking about version 1.10.7 to give a complete
> > >> picture.  I tested 2.1.3 as well, it also exhibits this issue,
> > >> although to a lesser extent though, and am asking for help on that
> > >> release.
> > >>
> > >> I was compiling the OpenMPI 1.10.7 shipped with NixOS against a newer
> > >> libibverbs with a large set of drivers and get some strange errors
> > >> when when running opmi_info (I've replaced the common prefix
> > >> /nix/store/9zm0pqsh67fw0xi5cpnybnd7hgzryffs-openmpi-1.10.7 with ...)
> > >>
> > >> [mon241:04077] mca: base: component_find: unable to open
> > >> .../lib/openmpi/mca_btl_openib: .../lib/openmpi/mca_btl_openib.so:
> > >> undefined symbol: mca_mpool_grdma_evict (ignored) [mon241:04077]
> > mca:
> > >> base: component_find: unable to open
> > >> .../lib/openmpi/mca_fcoll_individual:
> > >> .../lib/openmpi/mca_fcoll_individual.so: undefined symbol:
> > >> mca_io_ompio_file_write (ignored)
> > >> [mon241:04077] mca: base: component_find: unable to open
> > >> .../lib/openmpi/mca_fcoll_ylib: .../lib/openmpi/mca_fcoll_ylib.so:
> > >> undefined symbol: ompi_io_ompio_scatter_data (ignored) [mon241:04077]
> > >> mca: base: component_find: unable to open
> > >> .../lib/openmpi/mca_fcoll_dynamic:
> > >> .../lib/openmpi/mca_fcoll_dynamic.so: undefined symbol:
> > >> ompi_io_ompio_allgatherv_array (ignored) [mon241:04077] mca: base:
> > >> component_find: unable to open
> > >> .../lib/openmpi/mca_fcoll_two_phase:
> > >> .../lib/openmpi/mca_fcoll_two_phase.so: undefined symbol:
> > >> ompi_io_ompio_set_aggregator_props (ignored) [mon241:04077] mca:
> > >> base: component_find: unable to open
> > >> .../lib/openmpi/mca_fcoll_static: .../lib/openmpi/mca_fcoll_static.so:
> > >> undefined symbol: ompi_io_ompio_allgather_array (ignored)
> > >>                   Package: Open MPI nixbld@ Distribution
> > >>                 Open MPI: 1.10.7
> > >>   Open MPI repo revision: v1.10.6-48-g5e373bf
> > >>    Open MPI release date: May 16, 2017
> > >>                 Open RTE: 1.10.7
> > >>   Open RTE repo revision: v1.10.6-48-g5e373bf
> > >>    Open RTE release date: May 16, 2017
> > >>                     OPAL: 1.10.7
> > >>       OPAL repo revision: v1.10.6-48-g5e373bf
> > >>        OPAL release date: May 16, 2017 ...
> > >>
> > >> I dug into the first of these (figured out what library provided it,
> > >> looked at the declared dependencies, poked around in the automake
> > >> file) , and, as far as I could determine, it seems that
> > >> mca_btl_openib.so simply isn't linked to list mca_mpool_grdma.so
> > >> (which provides the symbol) as a dependency.
> > >>
> > >> Seeing as 1.10.7 is no longer supported.  I figured I would try 2.1.3
> > >> in case this has been fixed.  I compiled it up as well, and it seems
> > >> all but the mca_fcoll_individual one have been resolved (I've
> > >> replaced
> > >> /nix/store/4kh0zbn8pmdqhvwagicswg70rwnpm570-openmpi-2.1.3 with ...)
> > >>
> > >> [mon241:05544] mca_base_component_repository_open: unable to open
> > >> mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so:
> > >> undefined symbol: ompio_io_ompio_file_read (ignored)
> > >>                   Package: Open MPI nixbld@ Distribution
> > >>                 Open MPI: 2.1.3
> > >>   Open MPI repo revision: v2.1.2-129-gcfd8f3f
> > >>    Open MPI release date: Mar 13, 2018
> > >>                 Open RTE: 2.1.3
> > >>   Open RTE repo revision: v2.1.2-129-gcfd8f3f
> > >>    Open RTE release date: Mar 13, 2018
> > >>                     OPAL: 2.1.3
> > >>       OPAL repo revision: v2.1.2-129-gcfd8f3f
> > >>        OPAL release date: Mar 13, 2018 ...
> > >>
> > >> Again I was able to find this symbol in the mca_io_ompio.so library.
> > >> I looked through the source again, and it seems pretty clear that the
> > >> function is indeed called, but the library isn't linked to list the
> > >> mca_io_ompio.so library as a dependency
> > >>
> > >> Looking through the various shared libraries in the .../lib/openmpi
> > >> directory though, and it seems none of them have dependencies on each
> > >> other.  How is this suppose to work?  Is the component library just
> > >> suppose to load everything so all symbols get resolved?  Is the above
> > >> error I'm seeing an error then?
> > >>
> > >> Any insight would be appreciated.
> > >>
> > >> Thanks!  -Tyson
> > >>
> > >> PS:  Please note that the openmpi code was compiled without any
> > >> patches and without any special configure flags other than
> > >> --prefix=.... (NixOS also adds --diasble-static and
> > >> --disable-dependency-tracking by default, but I removed those, it
> > >> didn't make a difference)..
> > > _______________________________________________
> > > devel mailing list
> > > devel@lists.open-mpi.org
> > > https://lists.open-mpi.org/mailman/listinfo/devel
> > >
> > 
> > _______________________________________________
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/devel
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to