Out of curiosity - how are you testing these? I have more current versions of 
Slurm and would like to test the observations there.

> On Dec 1, 2014, at 7:49 PM, Gilles Gouaillardet 
> <gilles.gouaillar...@iferc.org> wrote:
> 
> I d like to make a step back ...
> 
> i previously tested with slurm 2.6.0, and it complained about the 
> slurm_verbose symbol that is defined in libslurm.so
> so with slurm 2.6.0, RTLD_GLOBAL or relinking is ok
> 
> now i tested with slurm 2.6.6 and it complains about the 
> slurm_auth_get_arg_desc symbol, and this symbol is not
> defined in any dynamic library. it is internally defined in the static 
> libcommon.a library, which is used to build the slurm binaries.
> 
> as far as i understand, auth_munge.so can only be invoked from a slurm 
> binary, which means it cannot be invoked from an mpi application
> even if it is linked with libslurm, libpmi, ...
> 
> that looks like a slurm design issue that the slurm folks will take care of.
> 
> Cheers,
> 
> Gilles
> 
> On 2014/12/02 12:33, Ralph Castain wrote:
>> Another option is to simply add the -lslurm -lauth flags to the pmix/s1 
>> component as this is the only place that requires it, and it won’t hurt 
>> anything to do so.
>> 
>> 
>>> On Dec 1, 2014, at 6:03 PM, Gilles Gouaillardet 
>>> <gilles.gouaillar...@iferc.org> <mailto:gilles.gouaillar...@iferc.org> 
>>> wrote:
>>> 
>>> Jeff,
>>> 
>>> FWIW, you can read my analysis of what is going wrong at
>>> http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php 
>>> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php> 
>>> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php> 
>>> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php>
>>> 
>>> bottom line, i agree this is a slurm issue (slurm plugin should depend
>>> on libslurm, but they do not, yet)
>>> 
>>> a possible workaround would be to make the pmi component a "proxy" that
>>> dlopen with RTLD_GLOBAL the "real" component in which the job is done.
>>> that being said, the impact is quite limited (no direct launch in slurm
>>> with pmi1, but pmi2 works fine) so it makes sense not to work around
>>> someone else problem.
>>> and that being said, configure could detect this broken pmi1 and not
>>> build pmi1 support or print a user friendly error message if pmi1 is used.
>>> 
>>> any thoughts ?
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> 
>>> On 2014/12/02 7:47, Jeff Squyres (jsquyres) wrote:
>>>> Ok, if the problem is moot, great.
>>>> 
>>>> (sidenote: this is moot, so ignore this if you want: with this 
>>>> explanation, I'm still not sure how RTLD_GLOBAL fixes the issue)
>>>> 
>>>> 
>>>> On Dec 1, 2014, at 5:15 PM, Ralph Castain <r...@open-mpi.org> 
>>>> <mailto:r...@open-mpi.org> wrote:
>>>> 
>>>>> Easy enough to explain. We link libpmi into the pmix/s1 component. This 
>>>>> library is missing the linkage to libslurm that contains the linkage to 
>>>>> libauth where munge resides. So when we call a PMI function, libpmi 
>>>>> references a call to munge for authentication and hits an “unresolved 
>>>>> symbol” error.
>>>>> 
>>>>> Moe acknowledges the error is in Slurm and is fixing the linkages so this 
>>>>> problem goes away
>>>>> 
>>>>> 
>>>>>> On Dec 1, 2014, at 2:13 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>>>>>> <mailto:jsquy...@cisco.com> wrote:
>>>>>> 
>>>>>> On Dec 1, 2014, at 5:07 PM, Ralph Castain <r...@open-mpi.org> 
>>>>>> <mailto:r...@open-mpi.org> wrote:
>>>>>> 
>>>>>>> FWIW: It’s Slurm’s pmi-1 library that isn’t linked correctly against 
>>>>>>> its dependencies (the pmi-2 one is correct).  Moe is aware of the 
>>>>>>> problem and fixing it on their side. This won’t help existing 
>>>>>>> installations until they upgrade, but I tend to agree with Jeff about 
>>>>>>> not fixing other people’s problems.
>>>>>> Can you explain what is happening?
>>>>>> 
>>>>>> I ask because I'm not sure I understand the problem such that using 
>>>>>> RTLD_GLOBAL would fix it.  I.e., even if libpmi1.so isn't linked against 
>>>>>> its dependencies properly, that shouldn't cause a problem if OMPI 
>>>>>> components A and B are both linked against libpmi1.so, and then A is 
>>>>>> loaded, and then B is loaded.
>>>>>> 
>>>>>> ...or perhaps we can just discuss this on the call tomorrow?
>>>>>> 
>>>>>> -- 
>>>>>> Jeff Squyres
>>>>>> jsquy...@cisco.com <mailto:jsquy...@cisco.com>
>>>>>> For corporate legal information go to: 
>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ 
>>>>>> <http://www.cisco.com/web/about/doing_business/legal/cri/>
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16383.php 
>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16383.php>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16384.php 
>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16384.php>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org <mailto:de...@open-mpi.org> <mailto:de...@open-mpi.org> 
>>> <mailto:de...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/12/16386.php 
>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php> 
>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php> 
>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php>
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/12/16387.php 
>> <http://www.open-mpi.org/community/lists/devel/2014/12/16387.php>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16388.php

Reply via email to