Unfortunately this does not complete this thread. The problem is not
solved! It is not an installation problem. I have no previous
installation since I use separate directories.
I have nothing specific to MPI path in my env, I just use the complete
path to mpicc and mpirun.

The error depends on which node I run on. For example I can run on node1
and node2, or node1 and node3, or node2 and node3, but not on node1,
node2 and node3. With the official version of the platform (1.8.1) it
works like a charm.

George, maybe, you could see it by yourself by connecting to our
platform (plafrim), since you have an account. It should be easier to
understand and see our problem.


Cyril.

Le 10/02/2017 à 18:15, George Bosilca a écrit :
> To complete this thread, the problem is now solved. Some .so were lingering 
> around from a previous installation causing startup pb.
> 
>   George.
> 
> 
>> On Feb 10, 2017, at 05:38 , Cyril Bordage <cyril.bord...@inria.fr> wrote:
>>
>> Thank you for your answer.
>> I am running the git master version (last tested was cad4c03).
>>
>> FYI, Clément Foyer is talking with George Bosilca about this problem.
>>
>>
>> Cyril.
>>
>> Le 08/02/2017 à 16:46, Jeff Squyres (jsquyres) a écrit :
>>> What version of Open MPI are you running?
>>>
>>> The error is indicating that Open MPI is trying to start a user-level 
>>> helper daemon on the remote node, and the daemon is seg faulting (which is 
>>> unusual).
>>>
>>> One thing to be aware of:
>>>
>>>     https://www.open-mpi.org/faq/?category=building#install-overwrite
>>>
>>>
>>>
>>>> On Feb 6, 2017, at 8:14 AM, Cyril Bordage <cyril.bord...@inria.fr> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I cannot run the a program with MPI when I compile it myself.
>>>> On some nodes I have the following error:
>>>> ================================================================================
>>>> [mimi012:17730] *** Process received signal ***
>>>> [mimi012:17730] Signal: Segmentation fault (11)
>>>> [mimi012:17730] Signal code: Address not mapped (1)
>>>> [mimi012:17730] Failing at address: 0xf8
>>>> [mimi012:17730] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7ffff66c0500]
>>>> [mimi012:17730] [ 1]
>>>> /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(opal_libevent2022_event_priority_set+0xa9)[0x7ffff781fcb9]
>>>> [mimi012:17730] [ 2]
>>>> /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(+0xebcd)[0x7ffff197fbcd]
>>>> [mimi012:17730] [ 3]
>>>> /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_peer_accept+0xa1)[0x7ffff1981e34]
>>>> [mimi012:17730] [ 4]
>>>> /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(+0xab1d)[0x7ffff197bb1d]
>>>> [mimi012:17730] [ 5]
>>>> /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(opal_libevent2022_event_base_loop+0x53c)[0x7ffff782323c]
>>>> [mimi012:17730] [ 6]
>>>> /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(+0x3d34c)[0x7ffff77c534c]
>>>> [mimi012:17730] [ 7] /lib64/libpthread.so.0(+0x7851)[0x7ffff66b8851]
>>>> [mimi012:17730] [ 8] /lib64/libc.so.6(clone+0x6d)[0x7ffff640694d]
>>>> [mimi012:17730] *** End of error message ***
>>>> --------------------------------------------------------------------------
>>>> ORTE has lost communication with its daemon located on node:
>>>>
>>>> hostname:  mimi012
>>>>
>>>> This is usually due to either a failure of the TCP network
>>>> connection to the node, or possibly an internal failure of
>>>> the daemon itself. We cannot recover from this failure, and
>>>> therefore will terminate the job.
>>>> --------------------------------------------------------------------------
>>>> ================================================================================
>>>>
>>>> The error does not appear with the official MPI installed in the
>>>> platform. I asked the admins about their compilation options but there
>>>> is nothing particular.
>>>>
>>>> Moreover it appears only for some node lists. Still, the nodes seem to
>>>> be fine since it works with the official version of MPI of the platform.
>>>>
>>>> To be sure it is not a network problem I tried to use "-mca btl
>>>> tcp,sm,self" or "-mca btl openib,sm,self" with no change.
>>>>
>>>> Do you have any idea where this error may come from?
>>>>
>>>> Thank you.
>>>>
>>>>
>>>> Cyril Bordage.
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel@lists.open-mpi.org
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>
>>>
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to