No, I do not use -disable-dlopen, this is the other thing that is confusing to 
me, how comes this error does not occur for anybody else.
Thanks
Edgar

> -----Original Message-----
> From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of Jeff
> Squyres (jsquyres) via devel
> Sent: Tuesday, June 12, 2018 9:11 AM
> To: Open MPI Developers List <devel@lists.open-mpi.org>
> Cc: Jeff Squyres (jsquyres) <jsquy...@cisco.com>
> Subject: Re: [OMPI devel] Shared object dependencies
> 
> How is it that Edgar is not running into these issues?
> 
> Edgar: are you compiling with --disable-dlopen, perchance?
> 
> 
> > On Jun 12, 2018, at 6:04 AM, Gilles Gouaillardet
> <gilles.gouaillar...@gmail.com> wrote:
> >
> > Edgar,
> >
> > Regarding this specific problem, the issue is mca_fcoll_individual.so
> > did not depend on libmca_commom_ompio.so, the PR does address that
> > (among other abstraction violations)
> >
> > What about following up in github  ?
> >
> > Cheers,
> >
> > Gilles
> >
> > On Tuesday, June 12, 2018, Gabriel, Edgar <egabr...@central.uh.edu>
> wrote:
> > So , I am still surprised to see this error message: if you look at lets 
> > say just
> one error message (and all others are the same):
> >
> > > > [orc-login2:107400] mca_base_component_repository_open: unable to
> > > > open
> > > > mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so:
> > > > undefined symbol: mca_common_ompio_file_write (ignored)
> >
> > How comes that the symbol mca_common_ompio_file_write can not be
> found ? It is in the common, that symbol should always be there, isn't it?
> > Your fix Gilles (which we can discuss) will not address this problem in my
> opinion. The symbols at this point that are accessed from the ompio
> component are used through a function pointer, not by name, and that
> should work in my opinion.(e.g. we do not call directly
> mca_io_ompio_set_aggregator_props, but we call the function pointer fh-
> >f_set_aggregator_props), and the same with the mca parmaeters, we access
> them through a function that is stored as a function pointer on the file
> handle structure.
> >
> > Thanks
> > Edgar
> >
> >
> > > -----Original Message-----
> > > From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of
> > > Gilles Gouaillardet
> > > Sent: Tuesday, June 12, 2018 3:28 AM
> > > To: devel@lists.open-mpi.org
> > > Subject: Re: [OMPI devel] Shared object dependencies
> > >
> > > Tyson,
> > >
> > >
> > > thanks for taking the time to do some more tests.
> > >
> > >
> > > This is really a bug in Open MPI, and unlike what I thought earlier,
> > > there are still
> > >
> > > some abstraction violations here and there related to ompio.
> > >
> > >
> > > I filed https://github.com/open-mpi/ompi/pull/5263 in order to
> > > address them
> > >
> > >
> > > Meanwhile, you can configure Open MPI with --disable-dlopen and
> > > hopefully, that will be
> > >
> > > enought to hide the issue.
> > >
> > >
> > > Cheers,
> > >
> > >
> > > Gilles
> > >
> > >
> > > On 6/12/2018 5:58 AM, Tyson Whitehead wrote:
> > > > I have now also tried release 3.1.0.  Same thing (were I have
> > > > replaced
> > > > /nix/store/glx60yay0hmmizhlxhqhnx9w3k4j9g1z-openmpi-3.1.0 with
> > > > ....)
> > > >
> > > > [orc-login2:107400] mca_base_component_repository_open: unable to
> > > > open
> > > > mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so:
> > > > undefined symbol: mca_common_ompio_file_write (ignored)
> > > > [orc-login2:107400] mca_base_component_repository_open: unable to
> > > > open
> > > > mca_fcoll_dynamic_gen2: .../lib/openmpi/mca_fcoll_dynamic_gen2.so:
> > > > undefined symbol: mca_common_ompio_register_print_entry (ignored)
> > > > [orc-login2:107400] mca_base_component_repository_open: unable to
> > > > open
> > > > mca_fcoll_dynamic: .../lib/openmpi/mca_fcoll_dynamic.so: undefined
> > > > symbol: mca_common_ompio_register_print_entry (ignored)
> > > > [orc-login2:107400] mca_base_component_repository_open: unable to
> > > > open
> > > > mca_fcoll_two_phase: .../lib/openmpi/mca_fcoll_two_phase.so:
> > > > undefined
> > > > symbol: mca_common_ompio_register_print_entry (ignored)
> > > > [orc-login2:107400] mca_base_component_repository_open: unable to
> > > > open
> > > > mca_fcoll_static: .../lib/openmpi/mca_fcoll_static.so: undefined
> > > > symbol: mca_common_ompio_register_print_entry (ignored)
> > > >                   Package: Open MPI nixbld@localhost Distribution
> > > >                  Open MPI: 3.1.0
> > > >    Open MPI repo revision: v3.1.0
> > > >     Open MPI release date: May 07, 2018
> > > >   pppp               Open RTE: 3.1.0
> > > >    Open RTE repo revision: v3.1.0
> > > >     Open RTE release date: May 07, 2018
> > > >                      OPAL: 3.1.0
> > > >         OPAL repo revision: v3.1.0
> > > >         OPAL release date: May 07, 2018
> > > >
> > > > I straced the process, and, as far as I could tell, it was just
> > > > mostly opening the shared objects in alphabetical order.  Would
> > > > appreciate any insight, such as whether this is normal behaviour I
> > > > can ignore or not?
> > > >
> > > > Thanks!  -Tyson
> > > > On Fri, 8 Jun 2018 at 17:37, Tyson Whitehead
> > > > <twhiteh...@gmail.com>
> > > wrote:
> > > >> This email starts out talking about version 1.10.7 to give a
> > > >> complete picture.  I tested 2.1.3 as well, it also exhibits this
> > > >> issue, although to a lesser extent though, and am asking for help
> > > >> on that release.
> > > >>
> > > >> I was compiling the OpenMPI 1.10.7 shipped with NixOS against a
> > > >> newer libibverbs with a large set of drivers and get some strange
> > > >> errors when when running opmi_info (I've replaced the common
> > > >> prefix
> > > >> /nix/store/9zm0pqsh67fw0xi5cpnybnd7hgzryffs-openmpi-1.10.7 with
> > > >> ...)
> > > >>
> > > >> [mon241:04077] mca: base: component_find: unable to open
> > > >> .../lib/openmpi/mca_btl_openib: .../lib/openmpi/mca_btl_openib.so:
> > > >> undefined symbol: mca_mpool_grdma_evict (ignored)
> [mon241:04077]
> > > mca:
> > > >> base: component_find: unable to open
> > > >> .../lib/openmpi/mca_fcoll_individual:
> > > >> .../lib/openmpi/mca_fcoll_individual.so: undefined symbol:
> > > >> mca_io_ompio_file_write (ignored) [mon241:04077] mca: base:
> > > >> component_find: unable to open
> > > >> .../lib/openmpi/mca_fcoll_ylib: .../lib/openmpi/mca_fcoll_ylib.so:
> > > >> undefined symbol: ompi_io_ompio_scatter_data (ignored)
> > > >> [mon241:04077]
> > > >> mca: base: component_find: unable to open
> > > >> .../lib/openmpi/mca_fcoll_dynamic:
> > > >> .../lib/openmpi/mca_fcoll_dynamic.so: undefined symbol:
> > > >> ompi_io_ompio_allgatherv_array (ignored) [mon241:04077] mca:
> base:
> > > >> component_find: unable to open
> > > >> .../lib/openmpi/mca_fcoll_two_phase:
> > > >> .../lib/openmpi/mca_fcoll_two_phase.so: undefined symbol:
> > > >> ompi_io_ompio_set_aggregator_props (ignored) [mon241:04077]
> mca:
> > > >> base: component_find: unable to open
> > > >> .../lib/openmpi/mca_fcoll_static: .../lib/openmpi/mca_fcoll_static.so:
> > > >> undefined symbol: ompi_io_ompio_allgather_array (ignored)
> > > >>                   Package: Open MPI nixbld@ Distribution
> > > >>                 Open MPI: 1.10.7
> > > >>   Open MPI repo revision: v1.10.6-48-g5e373bf
> > > >>    Open MPI release date: May 16, 2017
> > > >>                 Open RTE: 1.10.7
> > > >>   Open RTE repo revision: v1.10.6-48-g5e373bf
> > > >>    Open RTE release date: May 16, 2017
> > > >>                     OPAL: 1.10.7
> > > >>       OPAL repo revision: v1.10.6-48-g5e373bf
> > > >>        OPAL release date: May 16, 2017 ...
> > > >>
> > > >> I dug into the first of these (figured out what library provided
> > > >> it, looked at the declared dependencies, poked around in the
> > > >> automake
> > > >> file) , and, as far as I could determine, it seems that
> > > >> mca_btl_openib.so simply isn't linked to list mca_mpool_grdma.so
> > > >> (which provides the symbol) as a dependency.
> > > >>
> > > >> Seeing as 1.10.7 is no longer supported.  I figured I would try
> > > >> 2.1.3 in case this has been fixed.  I compiled it up as well, and
> > > >> it seems all but the mca_fcoll_individual one have been resolved
> > > >> (I've replaced
> > > >> /nix/store/4kh0zbn8pmdqhvwagicswg70rwnpm570-openmpi-2.1.3
> with
> > > >> ...)
> > > >>
> > > >> [mon241:05544] mca_base_component_repository_open: unable to
> open
> > > >> mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so:
> > > >> undefined symbol: ompio_io_ompio_file_read (ignored)
> > > >>                   Package: Open MPI nixbld@ Distribution
> > > >>                 Open MPI: 2.1.3
> > > >>   Open MPI repo revision: v2.1.2-129-gcfd8f3f
> > > >>    Open MPI release date: Mar 13, 2018
> > > >>                 Open RTE: 2.1.3
> > > >>   Open RTE repo revision: v2.1.2-129-gcfd8f3f
> > > >>    Open RTE release date: Mar 13, 2018
> > > >>                     OPAL: 2.1.3
> > > >>       OPAL repo revision: v2.1.2-129-gcfd8f3f
> > > >>        OPAL release date: Mar 13, 2018 ...
> > > >>
> > > >> Again I was able to find this symbol in the mca_io_ompio.so library.
> > > >> I looked through the source again, and it seems pretty clear that
> > > >> the function is indeed called, but the library isn't linked to
> > > >> list the mca_io_ompio.so library as a dependency
> > > >>
> > > >> Looking through the various shared libraries in the
> > > >> .../lib/openmpi directory though, and it seems none of them have
> > > >> dependencies on each other.  How is this suppose to work?  Is the
> > > >> component library just suppose to load everything so all symbols
> > > >> get resolved?  Is the above error I'm seeing an error then?
> > > >>
> > > >> Any insight would be appreciated.
> > > >>
> > > >> Thanks!  -Tyson
> > > >>
> > > >> PS:  Please note that the openmpi code was compiled without any
> > > >> patches and without any special configure flags other than
> > > >> --prefix=.... (NixOS also adds --diasble-static and
> > > >> --disable-dependency-tracking by default, but I removed those, it
> > > >> didn't make a difference)..
> > > > _______________________________________________
> > > > devel mailing list
> > > > devel@lists.open-mpi.org
> > > > https://lists.open-mpi.org/mailman/listinfo/devel
> > > >
> > >
> > > _______________________________________________
> > > devel mailing list
> > > devel@lists.open-mpi.org
> > > https://lists.open-mpi.org/mailman/listinfo/devel
> > _______________________________________________
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/devel
> > _______________________________________________
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/devel
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> 
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to