Out of curiosity - how are you testing these? I have more current versions of Slurm and would like to test the observations there.
> On Dec 1, 2014, at 7:49 PM, Gilles Gouaillardet > <gilles.gouaillar...@iferc.org> wrote: > > I d like to make a step back ... > > i previously tested with slurm 2.6.0, and it complained about the > slurm_verbose symbol that is defined in libslurm.so > so with slurm 2.6.0, RTLD_GLOBAL or relinking is ok > > now i tested with slurm 2.6.6 and it complains about the > slurm_auth_get_arg_desc symbol, and this symbol is not > defined in any dynamic library. it is internally defined in the static > libcommon.a library, which is used to build the slurm binaries. > > as far as i understand, auth_munge.so can only be invoked from a slurm > binary, which means it cannot be invoked from an mpi application > even if it is linked with libslurm, libpmi, ... > > that looks like a slurm design issue that the slurm folks will take care of. > > Cheers, > > Gilles > > On 2014/12/02 12:33, Ralph Castain wrote: >> Another option is to simply add the -lslurm -lauth flags to the pmix/s1 >> component as this is the only place that requires it, and it won’t hurt >> anything to do so. >> >> >>> On Dec 1, 2014, at 6:03 PM, Gilles Gouaillardet >>> <gilles.gouaillar...@iferc.org> <mailto:gilles.gouaillar...@iferc.org> >>> wrote: >>> >>> Jeff, >>> >>> FWIW, you can read my analysis of what is going wrong at >>> http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php >>> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php> >>> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php> >>> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php> >>> >>> bottom line, i agree this is a slurm issue (slurm plugin should depend >>> on libslurm, but they do not, yet) >>> >>> a possible workaround would be to make the pmi component a "proxy" that >>> dlopen with RTLD_GLOBAL the "real" component in which the job is done. >>> that being said, the impact is quite limited (no direct launch in slurm >>> with pmi1, but pmi2 works fine) so it makes sense not to work around >>> someone else problem. >>> and that being said, configure could detect this broken pmi1 and not >>> build pmi1 support or print a user friendly error message if pmi1 is used. >>> >>> any thoughts ? >>> >>> Cheers, >>> >>> Gilles >>> >>> On 2014/12/02 7:47, Jeff Squyres (jsquyres) wrote: >>>> Ok, if the problem is moot, great. >>>> >>>> (sidenote: this is moot, so ignore this if you want: with this >>>> explanation, I'm still not sure how RTLD_GLOBAL fixes the issue) >>>> >>>> >>>> On Dec 1, 2014, at 5:15 PM, Ralph Castain <r...@open-mpi.org> >>>> <mailto:r...@open-mpi.org> wrote: >>>> >>>>> Easy enough to explain. We link libpmi into the pmix/s1 component. This >>>>> library is missing the linkage to libslurm that contains the linkage to >>>>> libauth where munge resides. So when we call a PMI function, libpmi >>>>> references a call to munge for authentication and hits an “unresolved >>>>> symbol” error. >>>>> >>>>> Moe acknowledges the error is in Slurm and is fixing the linkages so this >>>>> problem goes away >>>>> >>>>> >>>>>> On Dec 1, 2014, at 2:13 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> >>>>>> <mailto:jsquy...@cisco.com> wrote: >>>>>> >>>>>> On Dec 1, 2014, at 5:07 PM, Ralph Castain <r...@open-mpi.org> >>>>>> <mailto:r...@open-mpi.org> wrote: >>>>>> >>>>>>> FWIW: It’s Slurm’s pmi-1 library that isn’t linked correctly against >>>>>>> its dependencies (the pmi-2 one is correct). Moe is aware of the >>>>>>> problem and fixing it on their side. This won’t help existing >>>>>>> installations until they upgrade, but I tend to agree with Jeff about >>>>>>> not fixing other people’s problems. >>>>>> Can you explain what is happening? >>>>>> >>>>>> I ask because I'm not sure I understand the problem such that using >>>>>> RTLD_GLOBAL would fix it. I.e., even if libpmi1.so isn't linked against >>>>>> its dependencies properly, that shouldn't cause a problem if OMPI >>>>>> components A and B are both linked against libpmi1.so, and then A is >>>>>> loaded, and then B is loaded. >>>>>> >>>>>> ...or perhaps we can just discuss this on the call tomorrow? >>>>>> >>>>>> -- >>>>>> Jeff Squyres >>>>>> jsquy...@cisco.com <mailto:jsquy...@cisco.com> >>>>>> For corporate legal information go to: >>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>> <http://www.cisco.com/web/about/doing_business/legal/cri/> >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org> >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16383.php >>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16383.php> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org <mailto:de...@open-mpi.org> >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16384.php >>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16384.php> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org <mailto:de...@open-mpi.org> <mailto:de...@open-mpi.org> >>> <mailto:de...@open-mpi.org> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/12/16386.php >>> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php> >>> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php> >>> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org <mailto:de...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/12/16387.php >> <http://www.open-mpi.org/community/lists/devel/2014/12/16387.php> > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/12/16388.php