That option might explain why your test process is failing (which segfaulted as well), but obviously wouldn't have anything to do with mpirun
On Jan 30, 2014, at 9:29 AM, Rolf vandeVaart <rvandeva...@nvidia.com> wrote: > I just retested with --mca mpi_leave_pinned 0 and that made no difference. I > still see the mpirun crash. > >> -----Original Message----- >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George >> Bosilca >> Sent: Thursday, January 30, 2014 11:59 AM >> To: Open MPI Developers >> Subject: Re: [OMPI devel] Intermittent mpirun crash? >> >> I got something similar 2 days ago, with a large software package abusing of >> MPI_Waitany/MPI_Waitsome (that was working seamlessly a month ago). I >> had to find a quick fix. Upon figuring out that turning the leave_pinned off >> fixes the problem, I did not investigate any further. >> >> Do you see a similar behavior? >> >> George. >> >> On Jan 30, 2014, at 17:26 , Rolf vandeVaart <rvandeva...@nvidia.com> wrote: >> >>> I am seeing this happening to me very intermittently. Looks like mpirun is >> getting a SEGV. Is anyone else seeing this? >>> This is 1.7.4 built yesterday. (Note that I added some stuff to what >>> is being printed out so the message is slightly different than 1.7.4 >>> output) >>> >>> mpirun - -np 6 -host >>> drossetti-ivy0,drossetti-ivy1,drossetti-ivy2,drossetti-ivy3 --mca >>> btl_openib_warn_default_gid_prefix 0 -- `pwd`/src/MPI_Waitsome_p_c >>> MPITEST info (0): Starting: MPI_Waitsome_p: Persistent Waitsome >>> using two nodes >>> MPITEST_results: MPI_Waitsome_p: Persistent Waitsome using two nodes >>> all tests PASSED (742) [drossetti-ivy0:10353] *** Process >>> (mpirun)received signal *** [drossetti-ivy0:10353] Signal: >>> Segmentation fault (11) [drossetti-ivy0:10353] Signal code: Address >>> not mapped (1) [drossetti-ivy0:10353] Failing at address: >>> 0x7fd31e5f208d [drossetti-ivy0:10353] End of signal information - not >>> sleeping >>> gmake[1]: *** [MPI_Waitsome_p_c] Segmentation fault (core dumped) >>> gmake[1]: Leaving directory `/geppetto/home/rvandevaart/public/ompi- >> tests/trunk/intel_tests' >>> >>> (gdb) where >>> #0 0x00007fd31f620807 in ?? () from /lib64/libgcc_s.so.1 >>> #1 0x00007fd31f6210b9 in _Unwind_Backtrace () from >>> /lib64/libgcc_s.so.1 >>> #2 0x00007fd31fb2893e in backtrace () from /lib64/libc.so.6 >>> #3 0x00007fd320b0d622 in opal_backtrace_buffer >> (message_out=0x7fd31e5e33a0, len_out=0x7fd31e5e33ac) >>> at >>> ../../../../../opal/mca/backtrace/execinfo/backtrace_execinfo.c:57 >>> #4 0x00007fd320b0a794 in show_stackframe (signo=11, >>> info=0x7fd31e5e3930, p=0x7fd31e5e3800) at >>> ../../../opal/util/stacktrace.c:354 >>> #5 <signal handler called> >>> #6 0x00007fd31e5f208d in ?? () >>> #7 0x00007fd31e5e46d8 in ?? () >>> #8 0x000000000000c2a8 in ?? () >>> #9 0x0000000000000000 in ?? () >>> >>> >>> ---------------------------------------------------------------------- >>> ------------- This email message is for the sole use of the intended >>> recipient(s) and may contain confidential information. Any >>> unauthorized review, use, disclosure or distribution is prohibited. >>> If you are not the intended recipient, please contact the sender by >>> reply email and destroy all copies of the original message. >>> ---------------------------------------------------------------------- >>> ------------- _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel