I can confirm this -- running a simple MPI "hello world" on one node with the rsh pls, the MPI processes finish and exit, but then orterun hangs:

(gdb) bt
#0  0xb7e8ef88 in poll () from /lib/libc.so.6
#1 0xb7f4f8a5 in poll_dispatch (arg=0xb7f6f080, tv=0xbfffe4f8) at poll.c:196
#2  0xb7f4d72b in opal_event_loop (flags=1) at event.c:515
#3  0xb7f5ac6e in opal_progress () at opal_progress.c:211
#4  0xb7d6fca1 in opal_condition_wait (c=0xb7d7242c, m=0xb7d72418)
    at condition.h:72
#5  0xb7d6f7f0 in orte_pls_rsh_finalize () at pls_rsh_module.c:833
#6  0xb7fb3ab6 in orte_pls_base_finalize () at pls_base_close.c:40
#7  0xb7d9092f in orte_rmgr_urm_finalize () at rmgr_urm.c:336
#8  0xb7fc14f7 in orte_rmgr_base_close () at rmgr_base_close.c:33
#9  0xb7fd3563 in orte_system_finalize () at orte_system_finalize.c:61
#10 0xb7fceca5 in orte_finalize () at orte_finalize.c:36
#11 0x0804a0d9 in main (argc=4, argv=0xbfffe6d4) at orterun.c:390

Am investigating...

On Aug 3, 2005, at 10:55 PM, Ralph H. Castain wrote:

Hmmm...it was running for me last night and (I thought) this morning,
but I'll test it again and see if I can reproduce the problem. Could
be something crept in there.

At 06:28 PM 8/3/2005, you wrote:
I just noticed that mpirun hangs forever inside the
orte_rmgr.finalize() routine. AFAIK this is new to today, and confirmed
on PPC64, x86-64, x86-32.

Don't have the immediate time, at the moment, to dig deeper, but
thought I would throw that out there.


Josh Hursey

devel mailing list

devel mailing list

{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

Reply via email to