Got it! Will take a little thinking to fix - it's basically a conflict between rollup and tree spawn. For now, you can run with:
-mca orte_use_common_port 0 -mca plm_rsh_no_tree_spawn 1 Sorry about that - thanks for letting me know! Ralph On Jun 20, 2012, at 9:48 PM, Eugene Loh wrote: > On 06/19/12 23:11, Ralph Castain wrote: >> Also, how did you configure this version? > --enable-heterogeneous > --enable-cxx-exceptions > --enable-shared > --enable-orterun-prefix-by-default > --with-sge > --enable-mpi-f90 > --with-mpi-f90-size=small > --disable-peruse > --disable-mpi-thread-multiple > --disable-debug > --disable-mem-debug > --disable-mem-profile > --enable-contrib-no-build=vt > >> If you had --disable-static, then there was a bug that would indeed cause a >> hang. Just committing that fix now. > I still get a hang even with r26623. >> On Jun 19, 2012, at 9:01 PM, Ralph Castain wrote: >>> See if it works with -mca orte_use_common_port 0 > > I get a segfault: > > [remote1:01409] *** Process received signal *** > [remote1:01409] Signal: Segmentation Fault (11) > [remote1:01409] Signal code: Address not mapped (1) > [remote1:01409] Failing at address: 2c > /home/eugene/r26609/lib/libopen-rte.so.0.0.0'show_stackframe+0x7d0 > /lib/amd64/libc.so.1'__sighndlr+0x6 > /lib/amd64/libc.so.1'call_user_handler+0x2c5 > /home/eugene/r26609/lib/libopen-rte.so.0.0.0'orte_grpcomm_base_rollup_recv+0x73 > [Signal 11 (SEGV)] > /home/eugene/r26609/lib/openmpi/mca_rml_oob.so'orte_rml_recv_msg_callback+0x9c > > /home/eugene/r26609/lib/openmpi/mca_oob_tcp.so'mca_oob_tcp_msg_data+0x283 > /home/eugene/r26609/lib/libopen-rte.so.0.0.0'event_process_active_single_queue+0x54c > > /home/eugene/r26609/lib/libopen-rte.so.0.0.0'event_process_active+0x41 > /home/eugene/r26609/lib/libopen-rte.so.0.0.0'opal_libevent2019_event_base_loop+0x606 > > /home/eugene/r26609/lib/libopen-rte.so.0.0.0'orte_daemon+0xd6d > /home/eugene/r26609/bin/orted'0xd4b > [remote1:01409] *** End of error message *** > Segmentation Fault (core dumped) > >>> >>> On Jun 19, 2012, at 8:31 PM, Eugene Loh wrote: >>>> I'm having bad luck with the trunk starting with r26609. Basically, >>>> things hang if I run >>>> >>>> mpirun -H remote1,remote2 -n 2 hostname >>>> >>>> where remote1 and remote2 are remote nodes. > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel