I just (yesterday) made the move from LAM/MPI to OpenMPI.  The configure / compile / install went smoothly (version 1.1.1).  However, after recompiling my source and executing it usually crashes in MPI_INIT.  Seems to be coming from the same place MOST of the time.  Usually spits out a message something like this.

Signal:10 info.si_errno:0(Unknown error: 0) si_code:1(BUS_ADRALN)
Failing at addr:0xfdff8018
*** End of error message ***
Signal:10 info.si_errno:0(Unknown error: 0) si_code:1(BUS_ADRALN)
Failing at addr:0x2807000
*** End of error message ***

The test system (before moving back to the cluster) is a G4 PowerBook with OS 10.4.8 (not using Xgrid at the moment).  I'm oversubscribing it (2 processes, it knows there is only one).  Attached are the config info from the install.  And listed below seems to be the crash point from the mca_bml_r2_progress function.  Any help is much appreciated. 

Karl

CRASH 1:
Command: nm
Path:    /Users/karl/programs/nm/build/Release/nm
Parent:  orted [830]

Version: ??? (???)

PID:    834
Thread: 0

Exception:  EXC_BAD_ACCESS (0x0001)
Codes:      KERN_INVALID_ADDRESS (0x0001) at 0xfdff8018

Thread 0 Crashed:
0   mca_btl_sm.so       0x003abbec mca_btl_sm_component_progress + 3164
1   mca_bml_r2.so       0x003a0d38 mca_bml_r2_progress + 88
2   libopal.0.dylib     0x0032309c opal_progress + 236
3   mca_oob_tcp.so      0x00024f14 mca_oob_tcp_msg_wait + 52
4   mca_oob_tcp.so      0x0002a0a8 mca_oob_tcp_recv + 1128
5   liborte.0.dylib     0x002f07b0 mca_oob_recv_packed + 80
6   mca_gpr_proxy.so    0x00059bd4 orte_gpr_proxy_put + 804
7   liborte.0.dylib     0x00304318 orte_soh_base_set_proc_soh + 968
8   libmpi.0.dylib      0x00222d88 ompi_mpi_init + 1816
9   libmpi.0.dylib      0x00248b50 MPI_Init + 240
10  nm                  0x00002e60 init_model + 48
11  nm                  0x00002c70 main + 48
12  nm                  0x00002494 _start + 340 (crt.c:272)
13  nm                  0x0000233c start + 60

Thread 0 crashed with PPC Thread State 64:
  srr0: 0x00000000003abbec srr1: 0x000000000200f930                        vrsave: 0x0000000000000000
    cr: 0x28004222          xer: 0x0000000000000004   lr: 0x00000000003aafa0  ctr: 0x00000000003aaf90
    r0: 0x0000000000000000   r1: 0x00000000bfffe8d0   r2: 0x00000000fdff8000   r3: 0x0000000000000001
    r4: 0x0000000000049814   r5: 0x00000000bfffe888   r6: 0x0000000000000000   r7: 0x00000000fdff8000
    r8: 0x0000000000000004   r9: 0x00000000004177e0  r10: 0x0000000000000004  r11: 0x0000000000000000
   r12: 0x00000000003aaf90  r13: 0x00000000fffffffe  r14: 0x00000000003ad004  r15: 0x00000000003441e8
   r16: 0x00000000003ad8c4  r17: 0x0000000000000004  r18: 0x0000000000000000  r19: 0x0000000000000000
   r20: 0x0000000000000014  r21: 0x0000000000000000  r22: 0x00000000003ae0c4  r23: 0x0000000000000001
   r24: 0x0000000000000000  r25: 0x0000000000000004  r26: 0x0000000000029c50  r27: 0x0000000000000000
   r28: 0x0000000000000000  r29: 0x0000000000000001  r30: 0x0000000000000000  r31: 0x00000000003aafa0



CRASH 2:
Command: nm
Path:    /Users/karl/programs/nm/build/Release/nm
Parent:  orted [830]

Version: ??? (???)

PID:    832
Thread: 0

Exception:  EXC_BAD_ACCESS (0x0001)
Codes:      KERN_PROTECTION_FAILURE (0x0002) at 0x00000000

Thread 0 Crashed:
0   <<00000000>>        0x00000000 0 + 0
1   mca_bml_r2.so       0x003a0d38 mca_bml_r2_progress + 88
2   libopal.0.dylib     0x0032309c opal_progress + 236
3   mca_oob_tcp.so      0x00024f14 mca_oob_tcp_msg_wait + 52
4   mca_oob_tcp.so      0x0002a0a8 mca_oob_tcp_recv + 1128
5   liborte.0.dylib     0x002f07b0 mca_oob_recv_packed + 80
6   mca_gpr_proxy.so    0x00059bd4 orte_gpr_proxy_put + 804
7   liborte.0.dylib     0x00304318 orte_soh_base_set_proc_soh + 968
8   libmpi.0.dylib      0x00222d88 ompi_mpi_init + 1816
9   libmpi.0.dylib      0x00248b50 MPI_Init + 240
10  nm                  0x00002e60 init_model + 48
11  nm                  0x00002c70 main + 48
12  nm                  0x00002494 _start + 340 (crt.c:272)
13  nm                  0x0000233c start + 60

Thread 0 crashed with PPC Thread State 64:
  srr0: 0x0000000000000000 srr1: 0x000000004000d930                        vrsave: 0x0000000000000000
    cr: 0x28004222          xer: 0x0000000000000004   lr: 0x00000000003abe5c  ctr: 0x0000000000000000
    r0: 0x0000000000000000   r1: 0x00000000bfffe8d0   r2: 0x0000000002008000   r3: 0x00000000003ad864
    r4: 0x0000000000000000   r5: 0x0000000002008000   r6: 0x0000000000000000   r7: 0x0000000002008000
    r8: 0x00000000003ad8c4   r9: 0x00000000004177e0  r10: 0x0000000000000000  r11: 0x0000000000000000
   r12: 0x0000000000000000  r13: 0x00000000fffffffe  r14: 0x00000000003ad004  r15: 0x00000000003441e8
   r16: 0x00000000003ad8c4  r17: 0x0000000000000000  r18: 0x0000000000000000  r19: 0x0000000000000000
   r20: 0x0000000000000000  r21: 0x0000000000000000  r22: 0x00000000003ae0c4  r23: 0x00000000003441e8
   r24: 0x0000000000000000  r25: 0x0000000002008000  r26: 0x00000000003ae0c4  r27: 0x0000000000000001
   r28: 0x0000000000000004  r29: 0x0000000000000001  r30: 0x0000000000000000  r31: 0x00000000003aafa0




CRASH 3:
Command: nm
Path:    /Users/karl/programs/nm/build/Debug/nm
Parent:  orted [1790]

Version: ??? (???)

PID:    1794
Thread: 0

Exception:  EXC_BAD_ACCESS (0x0001)
Codes:      KERN_INVALID_ADDRESS (0x0001) at 0xfdff8018

Thread 0 Crashed:
0   mca_btl_sm.so       0x003bcbec mca_btl_sm_component_progress + 3164
1   mca_bml_r2.so       0x003b1d38 mca_bml_r2_progress + 88
2   libopal.0.dylib     0x0032309c opal_progress + 236
3   mca_oob_tcp.so      0x00055f14 mca_oob_tcp_msg_wait + 52
4   mca_oob_tcp.so      0x0005b0a8 mca_oob_tcp_recv + 1128
5   liborte.0.dylib     0x002f07b0 mca_oob_recv_packed + 80
6   mca_gpr_proxy.so    0x00068bd4 orte_gpr_proxy_put + 804
7   liborte.0.dylib     0x00304318 orte_soh_base_set_proc_soh + 968
8   libmpi.0.dylib      0x00222d88 ompi_mpi_init + 1816
9   libmpi.0.dylib      0x00248b50 MPI_Init + 240
10  nm                  0x000028fc init_model + 80 (model.c:16)
11  nm                  0x00002644 main + 72 (main.c:16)
12  nm                  0x00001e54 _start + 340 (crt.c:272)
13  nm                  0x00001cfc start + 60

Thread 0 crashed with PPC Thread State 64:
  srr0: 0x00000000003bcbec srr1: 0x000000000200f930                        vrsave: 0x0000000000000000
    cr: 0x28004222          xer: 0x0000000000000004   lr: 0x00000000003bbfa0  ctr: 0x00000000003bbf90
    r0: 0x0000000000000000   r1: 0x00000000bfffe8f0   r2: 0x00000000fdff8000   r3: 0x0000000000000001
    r4: 0x0000000000049814   r5: 0x00000000bfffe8a8   r6: 0x0000000000000000   r7: 0x00000000fdff8000
    r8: 0x0000000000000004   r9: 0x00000000004177d0  r10: 0x0000000000000004  r11: 0x0000000000000000
   r12: 0x00000000003bbf90  r13: 0x00000000fffffffe  r14: 0x00000000003be004  r15: 0x00000000003441e8
   r16: 0x00000000003be8c4  r17: 0x0000000000000004  r18: 0x0000000000000000  r19: 0x0000000000000000
   r20: 0x0000000000000014  r21: 0x0000000000000000  r22: 0x00000000003bf0c4  r23: 0x0000000000000001
   r24: 0x0000000000000000  r25: 0x0000000000000004  r26: 0x000000000005ac50  r27: 0x0000000000000000
   r28: 0x0000000000000000  r29: 0x0000000000000001  r30: 0x0000000000000000  r31: 0x00000000003bbfa0



CRASH 4:
Command: nm
Path:    /Users/karl/programs/nm/build/Debug/nm
Parent:  orted [1790]

Version: ??? (???)

PID:    1792
Thread: 0

Exception:  EXC_BAD_ACCESS (0x0001)
Codes:      KERN_PROTECTION_FAILURE (0x0002) at 0x00000000

Thread 0 Crashed:
0   <<00000000>>        0x00000000 0 + 0
1   mca_bml_r2.so       0x003b1d38 mca_bml_r2_progress + 88
2   libopal.0.dylib     0x0032309c opal_progress + 236
3   mca_oob_tcp.so      0x00055f14 mca_oob_tcp_msg_wait + 52
4   mca_oob_tcp.so      0x0005b0a8 mca_oob_tcp_recv + 1128
5   liborte.0.dylib     0x002f07b0 mca_oob_recv_packed + 80
6   mca_gpr_proxy.so    0x00068bd4 orte_gpr_proxy_put + 804
7   liborte.0.dylib     0x00304318 orte_soh_base_set_proc_soh + 968
8   libmpi.0.dylib      0x00222d88 ompi_mpi_init + 1816
9   libmpi.0.dylib      0x00248b50 MPI_Init + 240
10  nm                  0x000028fc init_model + 80 (model.c:16)
11  nm                  0x00002644 main + 72 (main.c:16)
12  nm                  0x00001e54 _start + 340 (crt.c:272)
13  nm                  0x00001cfc start + 60

Thread 0 crashed with PPC Thread State 64:
  srr0: 0x0000000000000000 srr1: 0x000000004000d930                        vrsave: 0x0000000000000000
    cr: 0x28004222          xer: 0x0000000000000004   lr: 0x00000000003bce5c  ctr: 0x0000000000000000
    r0: 0x0000000000000000   r1: 0x00000000bfffe8f0   r2: 0x0000000002008000   r3: 0x00000000003be864
    r4: 0x0000000000000000   r5: 0x0000000002008000   r6: 0x0000000000000000   r7: 0x0000000002008000
    r8: 0x00000000003be8c4   r9: 0x00000000004177d0  r10: 0x0000000000000000  r11: 0x0000000000000000
   r12: 0x0000000000000000  r13: 0x00000000fffffffe  r14: 0x00000000003be004  r15: 0x00000000003441e8
   r16: 0x00000000003be8c4  r17: 0x0000000000000000  r18: 0x0000000000000000  r19: 0x0000000000000000
   r20: 0x0000000000000000  r21: 0x0000000000000000  r22: 0x00000000003bf0c4  r23: 0x00000000003441e8
   r24: 0x0000000000000000  r25: 0x0000000002008000  r26: 0x00000000003bf0c4  r27: 0x0000000000000001
   r28: 0x0000000000000004  r29: 0x0000000000000001  r30: 0x0000000000000000  r31: 0x00000000003bbfa0






Attachment: info.tar.gz
Description: GNU Zip compressed data

Reply via email to