What version of Open MPI are you running? The error is indicating that Open MPI is trying to start a user-level helper daemon on the remote node, and the daemon is seg faulting (which is unusual).
One thing to be aware of: https://www.open-mpi.org/faq/?category=building#install-overwrite > On Feb 6, 2017, at 8:14 AM, Cyril Bordage <cyril.bord...@inria.fr> wrote: > > Hello, > > I cannot run the a program with MPI when I compile it myself. > On some nodes I have the following error: > ================================================================================ > [mimi012:17730] *** Process received signal *** > [mimi012:17730] Signal: Segmentation fault (11) > [mimi012:17730] Signal code: Address not mapped (1) > [mimi012:17730] Failing at address: 0xf8 > [mimi012:17730] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7ffff66c0500] > [mimi012:17730] [ 1] > /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(opal_libevent2022_event_priority_set+0xa9)[0x7ffff781fcb9] > [mimi012:17730] [ 2] > /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(+0xebcd)[0x7ffff197fbcd] > [mimi012:17730] [ 3] > /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_peer_accept+0xa1)[0x7ffff1981e34] > [mimi012:17730] [ 4] > /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(+0xab1d)[0x7ffff197bb1d] > [mimi012:17730] [ 5] > /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(opal_libevent2022_event_base_loop+0x53c)[0x7ffff782323c] > [mimi012:17730] [ 6] > /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(+0x3d34c)[0x7ffff77c534c] > [mimi012:17730] [ 7] /lib64/libpthread.so.0(+0x7851)[0x7ffff66b8851] > [mimi012:17730] [ 8] /lib64/libc.so.6(clone+0x6d)[0x7ffff640694d] > [mimi012:17730] *** End of error message *** > -------------------------------------------------------------------------- > ORTE has lost communication with its daemon located on node: > > hostname: mimi012 > > This is usually due to either a failure of the TCP network > connection to the node, or possibly an internal failure of > the daemon itself. We cannot recover from this failure, and > therefore will terminate the job. > -------------------------------------------------------------------------- > ================================================================================ > > The error does not appear with the official MPI installed in the > platform. I asked the admins about their compilation options but there > is nothing particular. > > Moreover it appears only for some node lists. Still, the nodes seem to > be fine since it works with the official version of MPI of the platform. > > To be sure it is not a network problem I tried to use "-mca btl > tcp,sm,self" or "-mca btl openib,sm,self" with no change. > > Do you have any idea where this error may come from? > > Thank you. > > > Cyril Bordage. > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel -- Jeff Squyres jsquy...@cisco.com _______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel