Another option is to simply add the -lslurm -lauth flags to the pmix/s1 
component as this is the only place that requires it, and it won’t hurt 
anything to do so.


> On Dec 1, 2014, at 6:03 PM, Gilles Gouaillardet 
> <gilles.gouaillar...@iferc.org> wrote:
> 
> Jeff,
> 
> FWIW, you can read my analysis of what is going wrong at
> http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php 
> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php>
> 
> bottom line, i agree this is a slurm issue (slurm plugin should depend
> on libslurm, but they do not, yet)
> 
> a possible workaround would be to make the pmi component a "proxy" that
> dlopen with RTLD_GLOBAL the "real" component in which the job is done.
> that being said, the impact is quite limited (no direct launch in slurm
> with pmi1, but pmi2 works fine) so it makes sense not to work around
> someone else problem.
> and that being said, configure could detect this broken pmi1 and not
> build pmi1 support or print a user friendly error message if pmi1 is used.
> 
> any thoughts ?
> 
> Cheers,
> 
> Gilles
> 
> On 2014/12/02 7:47, Jeff Squyres (jsquyres) wrote:
>> Ok, if the problem is moot, great.
>> 
>> (sidenote: this is moot, so ignore this if you want: with this explanation, 
>> I'm still not sure how RTLD_GLOBAL fixes the issue)
>> 
>> 
>> On Dec 1, 2014, at 5:15 PM, Ralph Castain <r...@open-mpi.org> wrote:
>> 
>>> Easy enough to explain. We link libpmi into the pmix/s1 component. This 
>>> library is missing the linkage to libslurm that contains the linkage to 
>>> libauth where munge resides. So when we call a PMI function, libpmi 
>>> references a call to munge for authentication and hits an “unresolved 
>>> symbol” error.
>>> 
>>> Moe acknowledges the error is in Slurm and is fixing the linkages so this 
>>> problem goes away
>>> 
>>> 
>>>> On Dec 1, 2014, at 2:13 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>>>> wrote:
>>>> 
>>>> On Dec 1, 2014, at 5:07 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>> 
>>>>> FWIW: It’s Slurm’s pmi-1 library that isn’t linked correctly against its 
>>>>> dependencies (the pmi-2 one is correct).  Moe is aware of the problem and 
>>>>> fixing it on their side. This won’t help existing installations until 
>>>>> they upgrade, but I tend to agree with Jeff about not fixing other 
>>>>> people’s problems.
>>>> Can you explain what is happening?
>>>> 
>>>> I ask because I'm not sure I understand the problem such that using 
>>>> RTLD_GLOBAL would fix it.  I.e., even if libpmi1.so isn't linked against 
>>>> its dependencies properly, that shouldn't cause a problem if OMPI 
>>>> components A and B are both linked against libpmi1.so, and then A is 
>>>> loaded, and then B is loaded.
>>>> 
>>>> ...or perhaps we can just discuss this on the call tomorrow?
>>>> 
>>>> -- 
>>>> Jeff Squyres
>>>> jsquy...@cisco.com
>>>> For corporate legal information go to: 
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16383.php
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/12/16384.php
>> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org <mailto:de...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16386.php 
> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php>

Reply via email to