Hi,
I am experiencing several fancy bugs with ORTE.

All bugs occur on Intel 32 bits architecture under Mac OS X using gcc 4.2. The tested version is todays trunk (it also have occured for at least three weeks)

First occurs when compiling in "optimized" mode (aka configure --disable-debug --with-platform=optimized) and does not occur in debug mode.

~/ompi$ mpirun -np 1 echo foo
[laptop20:22960] *** Process received signal ***
[laptop20:22960] Signal: Bus error (10)
[laptop20:22960] Signal code:  (0)
[laptop20:22960] Failing at address: 0x0
[ 1] [0xbffff698, 0x00000000] (-P-)
[ 2] (mca_oob_base_init + 0x26) [0xbffff6e8, 0x000878a6]
[ 3] (orte_rml_oob_init + 0x11) [0xbffff6f8, 0x00032f21]
[ 4] (orte_rml_base_select + 0xc5) [0xbffff778, 0x0009f415]
[ 5] (orte_init_stage1 + 0x20c) [0xbffff848, 0x000678cc]
[ 6] (orte_system_init + 0x1d) [0xbffff868, 0x0006b03d]
[ 7] (orte_init + 0x7d) [0xbffff888, 0x000674ad]
[ 8] (orterun:F(0,1)=r(0,1);-2147483648;2147483647; + 0x220) [0xbffff938, 0x00002008] [ 9] (main:F(0,1)=r(0,1);-2147483648;2147483647; + 0x18) [0xbffff948, 0x00001de6]
[10] (_start + 0xd8) [0xbffff988, 0x00001db2]
[11] (start + 0x29) [0xbffff9a0, 0x00001cd9]
[12] [0x00000000, 0x00000005] (FP-)
[laptop20:22960] *** End of error message ***
Bus error


The other one occurs when running MPI program without mpirun (I know this is pretty useless but still ;) ). This bug does not require specific compilation options to occur. Running mpirun -np 1 mympiprogram is fine, but running mympiprogram fails with segfault in MPI_Finalize:

~/ompi$ mpirun -np 1 mpiself
~/ompi$ gdb mpiself
(gdb) r
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x77767578
0x90002e46 in szone_malloc ()
(gdb) bt
#0  0x90002e46 in szone_malloc ()
#1 0x0042b6da in opal_memory_darwin_malloc (zone=0x2000000, size=48) at memory_darwin_component.c:103
#2  0x90002a2f in malloc ()
#3 0x00421548 in opal_malloc (size=48, file=0x274fd4 "../../../opal/class/opal_object.h", line=468) at malloc.c:96 #4 0x002218e4 in opal_obj_new (cls=0x27d840) at ../../../opal/class/opal_object.h:468 #5 0x00221851 in opal_obj_new_debug (type=0x27d840, file=0x275424 "base/gpr_base_create_value_keyval.c", line=43) at ../../../opal/class/opal_object.h:247 #6 0x0022147e in orte_gpr_base_create_value (value=0xbffff8fc, addr_mode=32769, segment=0x510150 "orte-job-0", cnt=2, num_tokens=0) at base/gpr_base_create_value_keyval.c:43 #7 0x00269b79 in orte_smr_base_set_proc_state (proc=0x507d00, state=32, exit_status=0) at base/smr_base_set_proc_state.c:54
#8  0x01035f21 in ompi_mpi_finalize () at runtime/ompi_mpi_finalize.c:145
#9  0x0106ea09 in MPI_Finalize () at finalize.c:44
#10 0x00001e5e in main (argc=1, argv=0xbffffb70) at mpiself.c:44
(gdb)

Reply via email to