Hello, I cannot run the a program with MPI when I compile it myself. On some nodes I have the following error: ================================================================================ [mimi012:17730] *** Process received signal *** [mimi012:17730] Signal: Segmentation fault (11) [mimi012:17730] Signal code: Address not mapped (1) [mimi012:17730] Failing at address: 0xf8 [mimi012:17730] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7ffff66c0500] [mimi012:17730] [ 1] /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(opal_libevent2022_event_priority_set+0xa9)[0x7ffff781fcb9] [mimi012:17730] [ 2] /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(+0xebcd)[0x7ffff197fbcd] [mimi012:17730] [ 3] /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_peer_accept+0xa1)[0x7ffff1981e34] [mimi012:17730] [ 4] /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(+0xab1d)[0x7ffff197bb1d] [mimi012:17730] [ 5] /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(opal_libevent2022_event_base_loop+0x53c)[0x7ffff782323c] [mimi012:17730] [ 6] /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(+0x3d34c)[0x7ffff77c534c] [mimi012:17730] [ 7] /lib64/libpthread.so.0(+0x7851)[0x7ffff66b8851] [mimi012:17730] [ 8] /lib64/libc.so.6(clone+0x6d)[0x7ffff640694d] [mimi012:17730] *** End of error message *** -------------------------------------------------------------------------- ORTE has lost communication with its daemon located on node:
hostname: mimi012 This is usually due to either a failure of the TCP network connection to the node, or possibly an internal failure of the daemon itself. We cannot recover from this failure, and therefore will terminate the job. -------------------------------------------------------------------------- ================================================================================ The error does not appear with the official MPI installed in the platform. I asked the admins about their compilation options but there is nothing particular. Moreover it appears only for some node lists. Still, the nodes seem to be fine since it works with the official version of MPI of the platform. To be sure it is not a network problem I tried to use "-mca btl tcp,sm,self" or "-mca btl openib,sm,self" with no change. Do you have any idea where this error may come from? Thank you. Cyril Bordage. _______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel