So the child processes are not calling orte_init or anything like that? I can 
check it - any chance you can give me a line number via a debug build?

> On Feb 26, 2016, at 11:42 AM, Sylvain Jeaugey <sjeau...@nvidia.com> wrote:
> 
> I got this strange crash on master this night running nv/mpix_test :
> 
> Signal: Segmentation fault (11)
> Signal code: Address not mapped (1)
> Failing at address: 0x50
> [ 0] /lib64/libpthread.so.0(+0xf710)[0x7f9f19a80710]
> [ 1] 
> /ivylogin/home/sjeaugey/tests/mtt/scratches/mtt-scratch-4/installs/eGXW/install/lib/libopen-rte.so.0(orte_util_compare_name_fields+0x81)[0x7f9f1a88f6d7]
> [ 2] 
> /ivylogin/home/sjeaugey/tests/mtt/scratches/mtt-scratch-4/installs/eGXW/install/lib/openmpi/mca_iof_hnp.so(orte_iof_hnp_read_local_handler+0x247)[0x7f9f1109b4ab]
> [ 3] 
> /ivylogin/home/sjeaugey/tests/mtt/scratches/mtt-scratch-4/installs/eGXW/install/lib/libopen-pal.so.0(opal_libevent2022_event_base_loop+0xbf1)[0x7f9f1a5b68f1]
> [ 4] mpirun[0x405649][drossetti-ivy4:31651] [ 5] mpirun[0x403a48]
> [ 6] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f9f196fbd1d]
> [ 7] mpirun[0x4038e9]
> *** End of error message ***
> 
> This test is not even calling MPI_Init/Finalize, only 
> MPIX_Query_cuda_support. So it is really an ORTE race condition, and the 
> problem is hard to reproduce. It takes sometimes more than 50 runs with 
> random sleep between runs to see the problem.
> 
> I don't even know if we want to fix that -- what do you think ?
> 
> Sylvain
> 
> 
> 
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may 
> contain
> confidential information.  Any unauthorized review, use, disclosure or 
> distribution
> is prohibited.  If you are not the intended recipient, please contact the 
> sender by
> reply email and destroy all copies of the original message.
> -----------------------------------------------------------------------------------
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/02/18635.php

Reply via email to