I did build with --enable-debug. And I experimented with sending signal 11 to mpirun to see what I get out. In that case, I get a nice backtrace. Weird.
[rvandevaart@drossetti-ivy0 intel_tests]$ mpirun -np 2 sleep 20 [drossetti-ivy0:14033] *** Process (mpirun)received signal *** [drossetti-ivy0:14033] Signal: Segmentation fault (11) [drossetti-ivy0:14033] Signal code: (0) [drossetti-ivy0:14033] Failing at address: 0x7e5500005ace [drossetti-ivy0:14033] End of signal information - not sleeping [drossetti-ivy0:14033] *** Return value from opal_backtrace_buffer is 0 *** [drossetti-ivy0:14033] [ 0] /lib64/libpthread.so.0(+0xf500) [0x7f27b2fd8500] [drossetti-ivy0:14033] [ 1] /lib64/libc.so.6(__poll+0x53) [0x7f27b2d15293] [drossetti-ivy0:14033] [ 2] /geppetto/home/rvandevaart/ompi/ompi-v1.7/64-nocuda/lib/libopen-pal.so.6(+0x963e5) [0x7f27b3d283e5] [drossetti-ivy0:14033] [ 3] /geppetto/home/rvandevaart/ompi/ompi-v1.7/64-nocuda/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x26e) [0x7f27b3d1cdfc] [drossetti-ivy0:14033] [ 4] mpirun(orterun+0x137d) [0x4052b6] [drossetti-ivy0:14033] [ 5] mpirun(main+0x20) [0x4037b4] [drossetti-ivy0:14033] [ 6] /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f27b2c54cdd] [drossetti-ivy0:14033] [ 7] mpirun() [0x4036d9] [drossetti-ivy0:14033] *** End of error message *** [rvandevaart@drossetti-ivy0 intel_tests]$ _____________________ From: devel [devel-boun...@open-mpi.org] On Behalf Of Ralph Castain [r...@open-mpi.org] Sent: Thursday, January 30, 2014 11:51 AM To: Open MPI Developers Subject: Re: [OMPI devel] Intermittent mpirun crash? Huh - not much info there, I'm afraid. I gather you didn't build this with --enable-debug? On Jan 30, 2014, at 8:26 AM, Rolf vandeVaart <rvandeva...@nvidia.com> wrote: > I am seeing this happening to me very intermittently. Looks like mpirun is > getting a SEGV. Is anyone else seeing this? > This is 1.7.4 built yesterday. (Note that I added some stuff to what is > being printed out so the message is slightly different than 1.7.4 output) > > mpirun - -np 6 -host > drossetti-ivy0,drossetti-ivy1,drossetti-ivy2,drossetti-ivy3 --mca > btl_openib_warn_default_gid_prefix 0 -- `pwd`/src/MPI_Waitsome_p_c > MPITEST info (0): Starting: MPI_Waitsome_p: Persistent Waitsome using two > nodes > MPITEST_results: MPI_Waitsome_p: Persistent Waitsome using two nodes all > tests PASSED (742) > [drossetti-ivy0:10353] *** Process (mpirun)received signal *** > [drossetti-ivy0:10353] Signal: Segmentation fault (11) > [drossetti-ivy0:10353] Signal code: Address not mapped (1) > [drossetti-ivy0:10353] Failing at address: 0x7fd31e5f208d > [drossetti-ivy0:10353] End of signal information - not sleeping > gmake[1]: *** [MPI_Waitsome_p_c] Segmentation fault (core dumped) > gmake[1]: Leaving directory > `/geppetto/home/rvandevaart/public/ompi-tests/trunk/intel_tests' > > (gdb) where > #0 0x00007fd31f620807 in ?? () from /lib64/libgcc_s.so.1 > #1 0x00007fd31f6210b9 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1 > #2 0x00007fd31fb2893e in backtrace () from /lib64/libc.so.6 > #3 0x00007fd320b0d622 in opal_backtrace_buffer (message_out=0x7fd31e5e33a0, > len_out=0x7fd31e5e33ac) > at ../../../../../opal/mca/backtrace/execinfo/backtrace_execinfo.c:57 > #4 0x00007fd320b0a794 in show_stackframe (signo=11, info=0x7fd31e5e3930, > p=0x7fd31e5e3800) at ../../../opal/util/stacktrace.c:354 > #5 <signal handler called> > #6 0x00007fd31e5f208d in ?? () > #7 0x00007fd31e5e46d8 in ?? () > #8 0x000000000000c2a8 in ?? () > #9 0x0000000000000000 in ?? () > > > ----------------------------------------------------------------------------------- > This email message is for the sole use of the intended recipient(s) and may > contain > confidential information. Any unauthorized review, use, disclosure or > distribution > is prohibited. If you are not the intended recipient, please contact the > sender by > reply email and destroy all copies of the original message. > ----------------------------------------------------------------------------------- > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel _______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel