To complete this thread, the problem is now solved. Some .so were lingering around from a previous installation causing startup pb.
George. > On Feb 10, 2017, at 05:38 , Cyril Bordage <cyril.bord...@inria.fr> wrote: > > Thank you for your answer. > I am running the git master version (last tested was cad4c03). > > FYI, Clément Foyer is talking with George Bosilca about this problem. > > > Cyril. > > Le 08/02/2017 à 16:46, Jeff Squyres (jsquyres) a écrit : >> What version of Open MPI are you running? >> >> The error is indicating that Open MPI is trying to start a user-level helper >> daemon on the remote node, and the daemon is seg faulting (which is unusual). >> >> One thing to be aware of: >> >> https://www.open-mpi.org/faq/?category=building#install-overwrite >> >> >> >>> On Feb 6, 2017, at 8:14 AM, Cyril Bordage <cyril.bord...@inria.fr> wrote: >>> >>> Hello, >>> >>> I cannot run the a program with MPI when I compile it myself. >>> On some nodes I have the following error: >>> ================================================================================ >>> [mimi012:17730] *** Process received signal *** >>> [mimi012:17730] Signal: Segmentation fault (11) >>> [mimi012:17730] Signal code: Address not mapped (1) >>> [mimi012:17730] Failing at address: 0xf8 >>> [mimi012:17730] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7ffff66c0500] >>> [mimi012:17730] [ 1] >>> /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(opal_libevent2022_event_priority_set+0xa9)[0x7ffff781fcb9] >>> [mimi012:17730] [ 2] >>> /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(+0xebcd)[0x7ffff197fbcd] >>> [mimi012:17730] [ 3] >>> /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_peer_accept+0xa1)[0x7ffff1981e34] >>> [mimi012:17730] [ 4] >>> /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(+0xab1d)[0x7ffff197bb1d] >>> [mimi012:17730] [ 5] >>> /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(opal_libevent2022_event_base_loop+0x53c)[0x7ffff782323c] >>> [mimi012:17730] [ 6] >>> /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(+0x3d34c)[0x7ffff77c534c] >>> [mimi012:17730] [ 7] /lib64/libpthread.so.0(+0x7851)[0x7ffff66b8851] >>> [mimi012:17730] [ 8] /lib64/libc.so.6(clone+0x6d)[0x7ffff640694d] >>> [mimi012:17730] *** End of error message *** >>> -------------------------------------------------------------------------- >>> ORTE has lost communication with its daemon located on node: >>> >>> hostname: mimi012 >>> >>> This is usually due to either a failure of the TCP network >>> connection to the node, or possibly an internal failure of >>> the daemon itself. We cannot recover from this failure, and >>> therefore will terminate the job. >>> -------------------------------------------------------------------------- >>> ================================================================================ >>> >>> The error does not appear with the official MPI installed in the >>> platform. I asked the admins about their compilation options but there >>> is nothing particular. >>> >>> Moreover it appears only for some node lists. Still, the nodes seem to >>> be fine since it works with the official version of MPI of the platform. >>> >>> To be sure it is not a network problem I tried to use "-mca btl >>> tcp,sm,self" or "-mca btl openib,sm,self" with no change. >>> >>> Do you have any idea where this error may come from? >>> >>> Thank you. >>> >>> >>> Cyril Bordage. >>> _______________________________________________ >>> devel mailing list >>> devel@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >> >> > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel _______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel