It does look similar - question is: why didn’t this fix the problem? Will have 
to investigate.

Thanks


> On Dec 2, 2014, at 3:17 AM, Artem Polyakov <artpo...@gmail.com> wrote:
> 
> 
> 
> 2014-12-02 17:13 GMT+06:00 Ralph Castain <r...@open-mpi.org 
> <mailto:r...@open-mpi.org>>:
> Hmmm…if that is true, then it didn’t fix this problem as it is being reported 
> in the master.
> 
> I had this problem on my laptop installation. You can check my report it was 
> detailed enough and see if you hitting the same issue. My fix was also 
> included into 1.8 branch. I am not sure that this is the same issue but they 
> looks similar.
>  
> 
> 
>> On Dec 1, 2014, at 9:40 PM, Artem Polyakov <artpo...@gmail.com 
>> <mailto:artpo...@gmail.com>> wrote:
>> 
>> I think this might be related to the configuration problem I was fixing with 
>> Jeff few months ago. Refer here:
>> https://github.com/open-mpi/ompi/pull/240 
>> <https://github.com/open-mpi/ompi/pull/240>
>> 
>> 2014-12-02 10:15 GMT+06:00 Ralph Castain <r...@open-mpi.org 
>> <mailto:r...@open-mpi.org>>:
>> If it isn’t too much trouble, it would be good to confirm that it remains 
>> broken. I strongly suspect it is based on Moe’s comments.
>> 
>> Obviously, other people are making this work. For Intel MPI, all you do is 
>> point it at libpmi and they can run. However, they do explicitly dlopen it 
>> in their code, and I don’t know what flags they might pass when they do so.
>> 
>> If necessary, I suppose we could follow that pattern. In other words, rather 
>> than specifically linking the “s1” component to libpmi, instead require that 
>> the user point us to a pmi library via an MCA param, then explicitly dlopen 
>> that library with RTLD_GLOBAL. This avoids the issues cited by Jeff, but 
>> resolves the pmi linkage problem.
>> 
>> 
>>> On Dec 1, 2014, at 8:09 PM, Gilles Gouaillardet 
>>> <gilles.gouaillar...@iferc.org <mailto:gilles.gouaillar...@iferc.org>> 
>>> wrote:
>>> 
>>> $ srun --version
>>> slurm 2.6.6-VENDOR_PROVIDED
>>> 
>>> $ srun --mpi=pmi2 -n 1 ~/hw
>>> I am 0 / 1
>>> 
>>> $ srun -n 1 ~/hw
>>> /csc/home1/gouaillardet/hw: symbol lookup error: 
>>> /usr/lib64/slurm/auth_munge.so: undefined symbol: slurm_verbose
>>> srun: error: slurm_receive_msg: Zero Bytes were transmitted or received
>>> srun: error: slurm_receive_msg[10.0.3.15]: Zero Bytes were transmitted or 
>>> received
>>> srun: error: soleil: task 0: Exited with exit code 127
>>> 
>>> $ ldd /usr/lib64/slurm/auth_munge.so
>>>     linux-vdso.so.1 =>  (0x00007fff54478000)
>>>     libmunge.so.2 => /usr/lib64/libmunge.so.2 (0x00007f744760f000)
>>>     libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f74473f1000)
>>>     libc.so.6 => /lib64/libc.so.6 (0x00007f744705d000)
>>>     /lib64/ld-linux-x86-64.so.2 (0x0000003bf5400000)
>>> 
>>> 
>>> now, if i reling auth_munge.so so it depends on libslurm :
>>> 
>>> $ srun -n 1 ~/hw
>>> srun: symbol lookup error: /usr/lib64/slurm/auth_munge.so: undefined 
>>> symbol: slurm_auth_get_arg_desc
>>> 
>>> 
>>> i can give a try to the latest slurm if needed
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> 
>>> 
>>> On 2014/12/02 12:56, Ralph Castain wrote:
>>>> Out of curiosity - how are you testing these? I have more current versions 
>>>> of Slurm and would like to test the observations there.
>>>> 
>>>>> On Dec 1, 2014, at 7:49 PM, Gilles Gouaillardet 
>>>>> <gilles.gouaillar...@iferc.org> <mailto:gilles.gouaillar...@iferc.org> 
>>>>> wrote:
>>>>> 
>>>>> I d like to make a step back ...
>>>>> 
>>>>> i previously tested with slurm 2.6.0, and it complained about the 
>>>>> slurm_verbose symbol that is defined in libslurm.so
>>>>> so with slurm 2.6.0, RTLD_GLOBAL or relinking is ok
>>>>> 
>>>>> now i tested with slurm 2.6.6 and it complains about the 
>>>>> slurm_auth_get_arg_desc symbol, and this symbol is not
>>>>> defined in any dynamic library. it is internally defined in the static 
>>>>> libcommon.a library, which is used to build the slurm binaries.
>>>>> 
>>>>> as far as i understand, auth_munge.so can only be invoked from a slurm 
>>>>> binary, which means it cannot be invoked from an mpi application
>>>>> even if it is linked with libslurm, libpmi, ...
>>>>> 
>>>>> that looks like a slurm design issue that the slurm folks will take care 
>>>>> of.
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Gilles
>>>>> 
>>>>> On 2014/12/02 12:33, Ralph Castain wrote:
>>>>>> Another option is to simply add the -lslurm -lauth flags to the pmix/s1 
>>>>>> component as this is the only place that requires it, and it won’t hurt 
>>>>>> anything to do so.
>>>>>> 
>>>>>> 
>>>>>>> On Dec 1, 2014, at 6:03 PM, Gilles Gouaillardet 
>>>>>>> <gilles.gouaillar...@iferc.org> <mailto:gilles.gouaillar...@iferc.org> 
>>>>>>> <mailto:gilles.gouaillar...@iferc.org> 
>>>>>>> <mailto:gilles.gouaillar...@iferc.org> wrote:
>>>>>>> 
>>>>>>> Jeff,
>>>>>>> 
>>>>>>> FWIW, you can read my analysis of what is going wrong at
>>>>>>> http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php 
>>>>>>> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php>
>>>>>>> 
>>>>>>> bottom line, i agree this is a slurm issue (slurm plugin should depend
>>>>>>> on libslurm, but they do not, yet)
>>>>>>> 
>>>>>>> a possible workaround would be to make the pmi component a "proxy" that
>>>>>>> dlopen with RTLD_GLOBAL the "real" component in which the job is done.
>>>>>>> that being said, the impact is quite limited (no direct launch in slurm
>>>>>>> with pmi1, but pmi2 works fine) so it makes sense not to work around
>>>>>>> someone else problem.
>>>>>>> and that being said, configure could detect this broken pmi1 and not
>>>>>>> build pmi1 support or print a user friendly error message if pmi1 is 
>>>>>>> used.
>>>>>>> 
>>>>>>> any thoughts ?
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> 
>>>>>>> Gilles
>>>>>>> 
>>>>>>> On 2014/12/02 7:47, Jeff Squyres (jsquyres) wrote:
>>>>>>>> Ok, if the problem is moot, great.
>>>>>>>> 
>>>>>>>> (sidenote: this is moot, so ignore this if you want: with this 
>>>>>>>> explanation, I'm still not sure how RTLD_GLOBAL fixes the issue)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Dec 1, 2014, at 5:15 PM, Ralph Castain <r...@open-mpi.org> 
>>>>>>>> <mailto:r...@open-mpi.org> <mailto:r...@open-mpi.org> 
>>>>>>>> <mailto:r...@open-mpi.org> wrote:
>>>>>>>> 
>>>>>>>>> Easy enough to explain. We link libpmi into the pmix/s1 component. 
>>>>>>>>> This library is missing the linkage to libslurm that contains the 
>>>>>>>>> linkage to libauth where munge resides. So when we call a PMI 
>>>>>>>>> function, libpmi references a call to munge for authentication and 
>>>>>>>>> hits an “unresolved symbol” error.
>>>>>>>>> 
>>>>>>>>> Moe acknowledges the error is in Slurm and is fixing the linkages so 
>>>>>>>>> this problem goes away
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Dec 1, 2014, at 2:13 PM, Jeff Squyres (jsquyres) 
>>>>>>>>>> <jsquy...@cisco.com> <mailto:jsquy...@cisco.com> 
>>>>>>>>>> <mailto:jsquy...@cisco.com> <mailto:jsquy...@cisco.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> On Dec 1, 2014, at 5:07 PM, Ralph Castain <r...@open-mpi.org> 
>>>>>>>>>> <mailto:r...@open-mpi.org> <mailto:r...@open-mpi.org> 
>>>>>>>>>> <mailto:r...@open-mpi.org> wrote:
>>>>>>>>>> 
>>>>>>>>>>> FWIW: It’s Slurm’s pmi-1 library that isn’t linked correctly 
>>>>>>>>>>> against its dependencies (the pmi-2 one is correct).  Moe is aware 
>>>>>>>>>>> of the problem and fixing it on their side. This won’t help 
>>>>>>>>>>> existing installations until they upgrade, but I tend to agree with 
>>>>>>>>>>> Jeff about not fixing other people’s problems.
>>>>>>>>>> Can you explain what is happening?
>>>>>>>>>> 
>>>>>>>>>> I ask because I'm not sure I understand the problem such that using 
>>>>>>>>>> RTLD_GLOBAL would fix it.  I.e., even if libpmi1.so isn't linked 
>>>>>>>>>> against its dependencies properly, that shouldn't cause a problem if 
>>>>>>>>>> OMPI components A and B are both linked against libpmi1.so, and then 
>>>>>>>>>> A is loaded, and then B is loaded.
>>>>>>>>>> 
>>>>>>>>>> ...or perhaps we can just discuss this on the call tomorrow?
>>>>>>>>>> 
>>>>>>>>>> -- 
>>>>>>>>>> Jeff Squyres
>>>>>>>>>> jsquy...@cisco.com <mailto:jsquy...@cisco.com> 
>>>>>>>>>> <mailto:jsquy...@cisco.com> <mailto:jsquy...@cisco.com>
>>>>>>>>>> For corporate legal information go to: 
>>>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ 
>>>>>>>>>> <http://www.cisco.com/web/about/doing_business/legal/cri/> 
>>>>>>>>>> <http://www.cisco.com/web/about/doing_business/legal/cri/> 
>>>>>>>>>> <http://www.cisco.com/web/about/doing_business/legal/cri/>
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org> 
>>>>>>>>>> <mailto:de...@open-mpi.org> <mailto:de...@open-mpi.org>
>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>>>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>>>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>>>>>>> Link to this post: 
>>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16383.php 
>>>>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16383.php> 
>>>>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16383.php> 
>>>>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16383.php>
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org> 
>>>>>>>>> <mailto:de...@open-mpi.org> <mailto:de...@open-mpi.org>
>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>>>>>> Link to this post: 
>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16384.php 
>>>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16384.php> 
>>>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16384.php> 
>>>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16384.php>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org> 
>>>>>>> <mailto:de...@open-mpi.org> <mailto:de...@open-mpi.org> 
>>>>>>> <mailto:de...@open-mpi.org> <mailto:de...@open-mpi.org> 
>>>>>>> <mailto:de...@open-mpi.org> <mailto:de...@open-mpi.org>
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16386.php 
>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16386.php>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org> 
>>>>>> <mailto:de...@open-mpi.org> <mailto:de...@open-mpi.org>
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16387.php 
>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16387.php> 
>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16387.php> 
>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16387.php>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16388.php 
>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16388.php>
>>>> 
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16389.php 
>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16389.php>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/12/16390.php 
>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16390.php>
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/12/16391.php 
>> <http://www.open-mpi.org/community/lists/devel/2014/12/16391.php>
>> 
>> 
>> 
>> -- 
>> С Уважением, Поляков Артем Юрьевич
>> Best regards, Artem Y. Polyakov
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/12/16393.php 
>> <http://www.open-mpi.org/community/lists/devel/2014/12/16393.php>
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org <mailto:de...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16395.php 
> <http://www.open-mpi.org/community/lists/devel/2014/12/16395.php>
> 
> 
> 
> -- 
> С Уважением, Поляков Артем Юрьевич
> Best regards, Artem Y. Polyakov
> _______________________________________________
> devel mailing list
> de...@open-mpi.org <mailto:de...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16396.php 
> <http://www.open-mpi.org/community/lists/devel/2014/12/16396.php>

Reply via email to