So the child processes are not calling orte_init or anything like that? I can check it - any chance you can give me a line number via a debug build?
> On Feb 26, 2016, at 11:42 AM, Sylvain Jeaugey <sjeau...@nvidia.com> wrote: > > I got this strange crash on master this night running nv/mpix_test : > > Signal: Segmentation fault (11) > Signal code: Address not mapped (1) > Failing at address: 0x50 > [ 0] /lib64/libpthread.so.0(+0xf710)[0x7f9f19a80710] > [ 1] > /ivylogin/home/sjeaugey/tests/mtt/scratches/mtt-scratch-4/installs/eGXW/install/lib/libopen-rte.so.0(orte_util_compare_name_fields+0x81)[0x7f9f1a88f6d7] > [ 2] > /ivylogin/home/sjeaugey/tests/mtt/scratches/mtt-scratch-4/installs/eGXW/install/lib/openmpi/mca_iof_hnp.so(orte_iof_hnp_read_local_handler+0x247)[0x7f9f1109b4ab] > [ 3] > /ivylogin/home/sjeaugey/tests/mtt/scratches/mtt-scratch-4/installs/eGXW/install/lib/libopen-pal.so.0(opal_libevent2022_event_base_loop+0xbf1)[0x7f9f1a5b68f1] > [ 4] mpirun[0x405649][drossetti-ivy4:31651] [ 5] mpirun[0x403a48] > [ 6] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f9f196fbd1d] > [ 7] mpirun[0x4038e9] > *** End of error message *** > > This test is not even calling MPI_Init/Finalize, only > MPIX_Query_cuda_support. So it is really an ORTE race condition, and the > problem is hard to reproduce. It takes sometimes more than 50 runs with > random sleep between runs to see the problem. > > I don't even know if we want to fix that -- what do you think ? > > Sylvain > > > > ----------------------------------------------------------------------------------- > This email message is for the sole use of the intended recipient(s) and may > contain > confidential information. Any unauthorized review, use, disclosure or > distribution > is prohibited. If you are not the intended recipient, please contact the > sender by > reply email and destroy all copies of the original message. > ----------------------------------------------------------------------------------- > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/02/18635.php