Ralph, Sorry to be the bearer of more bad news. The "good" news is I've seen the new warning regarding the lack of a loopback interface. The BAD news is that it is occurring on a Linux cluster that I'ver verified DOES have 'lo' configured on the front-end and compute nodes (UP and RUNNING according to ifconfig).
Though run with "-np 2" the warning appears FIVE times. ADDITIONALLY, there is a SEGV at exit! Unfortunately, despite configuring with --enable-debug, I didn't get line numbers from the core (and there was no backtrace printed). All of this appears below (and no, "-mca mtl psm" is not a typo or a joke). Let me know what tracing flags to apply to gather the info needed to debug this. -Paul $ mpirun -mca btl sm,self -np 2 -host n15,n16 -mca mtl psm examples/ring_c -------------------------------------------------------------------------- WARNING: No loopback interface was found. This can cause problems when we spawn processes as they are likely to be unable to connect back to their host daemon. Sadly, it may take awhile for the connect attempt to fail, so you may experience a significant hang time. You may wish to ctrl-c out of your job and activate loopback support on at least one interface before trying again. -------------------------------------------------------------------------- [... above message FOUR more times ...] Process 1 exiting Process 0 sending 10 to 1, tag 201 (2 processes in ring) Process 0 sent to 1 Process 0 decremented value: 9 Process 0 decremented value: 8 Process 0 decremented value: 7 Process 0 decremented value: 6 Process 0 decremented value: 5 Process 0 decremented value: 4 Process 0 decremented value: 3 Process 0 decremented value: 2 Process 0 decremented value: 1 Process 0 decremented value: 0 Process 0 exiting -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 0 on node n15 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- $ /sbin/ifconfig lo lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:481228 errors:0 dropped:0 overruns:0 frame:0 TX packets:481228 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:81039065 (77.2 MiB) TX bytes:81039065 (77.2 MiB) $ ssh n15 /sbin/ifconfig lo lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:24885 errors:0 dropped:0 overruns:0 frame:0 TX packets:24885 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:1509940 (1.4 MiB) TX bytes:1509940 (1.4 MiB) $ ssh n16 /sbin/ifconfig lo lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:24938 errors:0 dropped:0 overruns:0 frame:0 TX packets:24938 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:1543408 (1.4 MiB) TX bytes:1543408 (1.4 MiB) $ gdb examples/ring_c core.29728 [...] (gdb) where #0 0x0000002a97a19980 in ?? () #1 <signal handler called> #2 0x0000003a6d40607c in _Unwind_FindEnclosingFunction () from /lib64/libgcc_s.so.1 #3 0x0000003a6d406b57 in _Unwind_RaiseException () from /lib64/libgcc_s.so.1 #4 0x0000003a6d406c4c in _Unwind_ForcedUnwind () from /lib64/libgcc_s.so.1 #5 0x0000003a6c30ac50 in __pthread_unwind () from /lib64/tls/libpthread.so.0 #6 0x0000003a6c305202 in sigcancel_handler () from /lib64/tls/libpthread.so.0 #7 <signal handler called> #8 0x0000003a6b6bd9a2 in poll () from /lib64/tls/libc.so.6 #9 0x0000002a978f8f7d in ?? () #10 0x002000010000000e in ?? () #11 0x0000000000000000 in ?? () -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900