On 06/19/12 23:11, Ralph Castain wrote:
Also, how did you configure this version?
  --enable-heterogeneous
  --enable-cxx-exceptions
  --enable-shared
  --enable-orterun-prefix-by-default
  --with-sge
  --enable-mpi-f90
  --with-mpi-f90-size=small
  --disable-peruse
  --disable-mpi-thread-multiple
  --disable-debug
  --disable-mem-debug
  --disable-mem-profile
  --enable-contrib-no-build=vt

  If you had --disable-static, then there was a bug that would indeed cause a 
hang. Just committing that fix now.
I still get a hang even with r26623.
On Jun 19, 2012, at 9:01 PM, Ralph Castain wrote:
See if it works with -mca orte_use_common_port 0

I get a segfault:

[remote1:01409] *** Process received signal ***
[remote1:01409] Signal: Segmentation Fault (11)
[remote1:01409] Signal code: Address not mapped (1)
[remote1:01409] Failing at address: 2c
/home/eugene/r26609/lib/libopen-rte.so.0.0.0'show_stackframe+0x7d0
/lib/amd64/libc.so.1'__sighndlr+0x6
/lib/amd64/libc.so.1'call_user_handler+0x2c5
/home/eugene/r26609/lib/libopen-rte.so.0.0.0'orte_grpcomm_base_rollup_recv+0x73 [Signal 11 (SEGV)] /home/eugene/r26609/lib/openmpi/mca_rml_oob.so'orte_rml_recv_msg_callback+0x9c
/home/eugene/r26609/lib/openmpi/mca_oob_tcp.so'mca_oob_tcp_msg_data+0x283
/home/eugene/r26609/lib/libopen-rte.so.0.0.0'event_process_active_single_queue+0x54c
/home/eugene/r26609/lib/libopen-rte.so.0.0.0'event_process_active+0x41
/home/eugene/r26609/lib/libopen-rte.so.0.0.0'opal_libevent2019_event_base_loop+0x606
/home/eugene/r26609/lib/libopen-rte.so.0.0.0'orte_daemon+0xd6d
/home/eugene/r26609/bin/orted'0xd4b
[remote1:01409] *** End of error message ***
Segmentation Fault (core dumped)


On Jun 19, 2012, at 8:31 PM, Eugene Loh wrote:
I'm having bad luck with the trunk starting with r26609.  Basically, things 
hang if I run

   mpirun -H remote1,remote2 -n 2 hostname

where remote1 and remote2 are remote nodes.

Reply via email to