Sigh - if you just go to line 205 in the indicated file and blow away that 
print statement, the segfault should end.

However, that won't solve the root problem - you'll just cleanly exit with an 
error statement. The issue is that we aren't finding ssh or rsh in your PATH. 
Do you have one or both of those installed?


On Nov 16, 2013, at 2:33 AM, Sylvestre Ledru <sylves...@debian.org> wrote:

> On 15/11/2013 17:50, Ralph Castain wrote:
>> Hmm...well, that will make debug a tad more difficult. I've attached a patch 
>> that *should* stop the segfault. Given that behavior, though, it looks like 
>> the system isn't finding either rsh or ssh on your machine. Might be the 
>> root cause of the problem.
> With your patch:
> $ ./mpirun -mca plm_base_verbose 5 -mca ras_base_verbose 5 
> -mcarmaps_base_verbose 5 -mca ess_base_verbose  5 -c 4 foo
> [merulo:08821] mca:base:select:(  plm) Querying component [rsh]
> [merulo:08821] [[INVALID],INVALID] plm:base:rsh_lookup on agent ssh : rsh 
> path NULL
> [merulo:08821] *** Process received signal ***
> [merulo:08821] Signal: Segmentation fault (11)
> [merulo:08821] Signal code: Invalid permissions (2)
> [merulo:08821] Failing at address: (nil)
> [merulo:08821] [ 0] linux-gate.so.1(__kernel_sigtramp+0x7fffffffff886860) 
> [0xa000000000040800]
> [merulo:08821] [ 1] 
> /home/sylvestre/bogus2/lib/openmpi/mca_plm_rsh.so(orte_plm_rsh_component_query+0xae3b0)
>  [0x2000000000867f30]
> [merulo:08821] [ 2] 
> /home/sylvestre/bogus2/lib/libopen-rte.so.4(mca_base_select-0x5dc110) 
> [0x20000000001ddea0]
> [merulo:08821] [ 3] 
> /home/sylvestre/bogus2/lib/libopen-rte.so.4(orte_plm_base_select-0x680cd0) 
> [0x20000000001392f0]
> [merulo:08821] [ 4] 
> /home/sylvestre/bogus2/lib/openmpi/mca_ess_hnp.so(+0x56f0) 
> [0x20000000008316f0]
> [merulo:08821] [ 5] 
> /home/sylvestre/bogus2/lib/libopen-rte.so.4(orte_init-0x72bf10) 
> [0x200000000008e0c0]
> [merulo:08821] [ 6] ./mpirun(orterun+0x1fffffffff84cc80) [0x4000000000006c60]
> [merulo:08821] [ 7] ./mpirun(main+0x1fffffffff84b880) [0x40000000000045e0]
> [merulo:08821] [ 8] 
> /lib/ia64-linux-gnu/libc.so.6.1(__libc_start_main-0x2fcd50) 
> [0x20000000004bd2a0]
> [merulo:08821] [ 9] ./mpirun(_start+0x1fffffffff84a3c0) [0x40000000000043c0]
> [merulo:08821] *** End of error message ***
> Segmentation fault
> 
> bt:
> Program received signal SIGSEGV, Segmentation fault.
> 0x2000000000867f30 in orte_plm_rsh_component_query (
>     module=0x60000fffffffb0e8, priority=0x60000fffffffb0e0)
>     at plm_rsh_component.c:205
> 205            OPAL_OUTPUT_VERBOSE((1, orte_plm_globals.output,
> (gdb) bt
> #0  0x2000000000867f30 in orte_plm_rsh_component_query (
>     module=0x60000fffffffb0e8, priority=0x60000fffffffb0e0)
>     at plm_rsh_component.c:205
> #1  0x20000000001ddea0 in mca_base_select (
>     type_name=0x200000000026e708 "plm", output_id=8, 
>     components_available=0x20000000002c5f08 <orte_plm_base>, 
>     best_module=0x60000fffffffb0f0, best_component=0x60000fffffffb0f8)
>     at mca_base_components_select.c:76
> #2  0x20000000001392f0 in orte_plm_base_select () at base/plm_base_select.c:46
> #3  0x20000000008316f0 in rte_init () at ess_hnp_module.c:169
> #4  0x200000000008e0c0 in orte_init (pargc=0x60000fffffffb370, 
>     pargv=0x60000fffffffb378, flags=4) at runtime/orte_init.c:127
> #5  0x4000000000006c60 in orterun (argc=15, argv=0x60000fffffffb628)
>     at orterun.c:693
> #6  0x40000000000045e0 in main (argc=15, argv=0x60000fffffffb628) at main.c:13
> 
> S
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to