On 15/11/2013 17:50, Ralph Castain wrote: > Hmm...well, that will make debug a tad more difficult. I've attached a > patch that *should* stop the segfault. Given that behavior, though, it > looks like the system isn't finding either rsh or ssh on your machine. > Might be the root cause of the problem. With your patch: $ ./mpirun -mca plm_base_verbose 5 -mca ras_base_verbose 5 -mcarmaps_base_verbose 5 -mca ess_base_verbose 5 -c 4 foo [merulo:08821] mca:base:select:( plm) Querying component [rsh] [merulo:08821] [[INVALID],INVALID] plm:base:rsh_lookup on agent ssh : rsh path NULL [merulo:08821] *** Process received signal *** [merulo:08821] Signal: Segmentation fault (11) [merulo:08821] Signal code: Invalid permissions (2) [merulo:08821] Failing at address: (nil) [merulo:08821] [ 0] linux-gate.so.1(__kernel_sigtramp+0x7fffffffff886860) [0xa000000000040800] [merulo:08821] [ 1] /home/sylvestre/bogus2/lib/openmpi/mca_plm_rsh.so(orte_plm_rsh_component_query+0xae3b0) [0x2000000000867f30] [merulo:08821] [ 2] /home/sylvestre/bogus2/lib/libopen-rte.so.4(mca_base_select-0x5dc110) [0x20000000001ddea0] [merulo:08821] [ 3] /home/sylvestre/bogus2/lib/libopen-rte.so.4(orte_plm_base_select-0x680cd0) [0x20000000001392f0] [merulo:08821] [ 4] /home/sylvestre/bogus2/lib/openmpi/mca_ess_hnp.so(+0x56f0) [0x20000000008316f0] [merulo:08821] [ 5] /home/sylvestre/bogus2/lib/libopen-rte.so.4(orte_init-0x72bf10) [0x200000000008e0c0] [merulo:08821] [ 6] ./mpirun(orterun+0x1fffffffff84cc80) [0x4000000000006c60] [merulo:08821] [ 7] ./mpirun(main+0x1fffffffff84b880) [0x40000000000045e0] [merulo:08821] [ 8] /lib/ia64-linux-gnu/libc.so.6.1(__libc_start_main-0x2fcd50) [0x20000000004bd2a0] [merulo:08821] [ 9] ./mpirun(_start+0x1fffffffff84a3c0) [0x40000000000043c0] [merulo:08821] *** End of error message *** Segmentation fault
bt: Program received signal SIGSEGV, Segmentation fault. 0x2000000000867f30 in orte_plm_rsh_component_query ( module=0x60000fffffffb0e8, priority=0x60000fffffffb0e0) at plm_rsh_component.c:205 205 OPAL_OUTPUT_VERBOSE((1, orte_plm_globals.output, (gdb) bt #0 0x2000000000867f30 in orte_plm_rsh_component_query ( module=0x60000fffffffb0e8, priority=0x60000fffffffb0e0) at plm_rsh_component.c:205 #1 0x20000000001ddea0 in mca_base_select ( type_name=0x200000000026e708 "plm", output_id=8, components_available=0x20000000002c5f08 <orte_plm_base>, best_module=0x60000fffffffb0f0, best_component=0x60000fffffffb0f8) at mca_base_components_select.c:76 #2 0x20000000001392f0 in orte_plm_base_select () at base/plm_base_select.c:46 #3 0x20000000008316f0 in rte_init () at ess_hnp_module.c:169 #4 0x200000000008e0c0 in orte_init (pargc=0x60000fffffffb370, pargv=0x60000fffffffb378, flags=4) at runtime/orte_init.c:127 #5 0x4000000000006c60 in orterun (argc=15, argv=0x60000fffffffb628) at orterun.c:693 #6 0x40000000000045e0 in main (argc=15, argv=0x60000fffffffb628) at main.c:13 S