Gilles, I applied your patch to v1.8 and it run successfully on my SPARC machines.
Takahiro Kawashima, MPI development team, Fujitsu > Kawashima-san and all, > > Here is attached a one off patch for v1.8. > /* it does not use the __attribute__ modifier that might not be > supported by all compilers */ > > as far as i am concerned, the same issue is also in the trunk, > and if you do not hit it, it just means you are lucky :-) > > the same issue might also be in other parts of the code :-( > > Cheers, > > Gilles > > On 2014/08/08 13:45, Kawashima, Takahiro wrote: > > Gilles, George, > > > > The problem is the one Gilles pointed. > > I temporarily modified the code bellow and the bus error disappeared. > > > > --- orte/util/nidmap.c (revision 32447) > > +++ orte/util/nidmap.c (working copy) > > @@ -885,7 +885,7 @@ > > orte_proc_state_t state; > > orte_app_idx_t app_idx; > > int32_t restarts; > > - orte_process_name_t proc, dmn; > > + orte_process_name_t proc __attribute__((__aligned__(8))), dmn; > > char *hostname; > > uint8_t flag; > > opal_buffer_t *bptr; > > > > Takahiro Kawashima, > > MPI development team, > > Fujitsu > > > >> Kawashima-san, > >> > >> This is interesting :-) > >> > >> proc is in the stack and has type orte_process_name_t > >> > >> with > >> > >> typedef uint32_t orte_jobid_t; > >> typedef uint32_t orte_vpid_t; > >> struct orte_process_name_t { > >> orte_jobid_t jobid; /**< Job number */ > >> orte_vpid_t vpid; /**< Process id - equivalent to rank */ > >> }; > >> typedef struct orte_process_name_t orte_process_name_t; > >> > >> > >> so there is really no reason to align this on 8 bytes... > >> but later, proc is casted into an uint64_t ... > >> so proc should have been aligned on 8 bytes but it is too late, > >> and hence the glory SIGBUS > >> > >> > >> this is loosely related to > >> http://www.open-mpi.org/community/lists/devel/2014/08/15532.php > >> (see heterogeneous.v2.patch) > >> if we make opal_process_name_t an union of uint64_t and a struct of two > >> uint32_t, the compiler > >> will align this on 8 bytes. > >> note the patch is not enough (and will not apply on the v1.8 branch > >> anyway), > >> we could simply remove orte_process_name_t and ompi_process_name_t and > >> use only > >> opal_process_name_t (and never declare variables with type > >> opal_proc_name_t otherwise alignment might be incorrect) > >> > >> as a workaround, you can declare an opal_process_name_t (for alignment), > >> and cast it to an orte_process_name_t > >> > >> i will write a patch (i will not be able to test on sparc ...) > >> please note this issue might be present in other places > >> > >> Cheers, > >> > >> Gilles > >> > >> On 2014/08/08 13:03, Kawashima, Takahiro wrote: > >>> Hi, > >>> > >>>>>>>> I have installed openmpi-1.8.2rc2 with gcc-4.9.0 on Solaris > >>>>>>>> 10 Sparc and I receive a bus error, if I run a small program. > >>> I've finally reproduced the bus error in my SPARC environment. > >>> > >>> #0 0xffffffff00db4740 (__waitpid_nocancel + 0x44) > >>> (0x200,0x0,0x0,0xa0,0xfffff80100064af0,0x35b4) > >>> #1 0xffffffff0001a310 (handle_signal + 0x574) (signo=10,info=(struct > >>> siginfo *) 0x000007feffffd100,p=(void *) 0x000007feffffd100) at line 277 > >>> in ../sigattach.c <SIGNAL HANDLER> > >>> #2 0xffffffff0282aff4 (store + 0x540) (uid=(unsigned long *) > >>> 0xffffffff0118a128,scope=8:'\b',key=(char *) 0xffffffff0106a0a8 > >>> "opal.local.ldr",data=(void *) 0x000007feffffde74,type=15:'\017') at line > >>> 252 in db_hash.c > >>> #3 0xffffffff01266350 (opal_db_base_store + 0xc4) (proc=(unsigned long *) > >>> 0xffffffff0118a128,scope=8:'\b',key=(char *) 0xffffffff0106a0a8 > >>> "opal.local.ldr",object=(void *) 0x000007feffffde74,type=15:'\017') at > >>> line 49 in db_base_fns.c > >>> #4 0xffffffff00fdbab4 (orte_util_decode_pidmap + 0x790) (bo=(struct *) > >>> 0x0000000000281d70) at line 975 in nidmap.c > >>> #5 0xffffffff00fd6d20 (orte_util_nidmap_init + 0x3dc) (buffer=(struct > >>> opal_buffer_t *) 0x0000000000241fc0) at line 141 in nidmap.c > >>> #6 0xffffffff01e298cc (rte_init + 0x2a0) () at line 153 in > >>> ess_env_module.c > >>> #7 0xffffffff00f9f28c (orte_init + 0x308) (pargc=(int *) > >>> 0x0000000000000000,pargv=(char ***) 0x0000000000000000,flags=32) at line > >>> 148 in orte_init.c > >>> #8 0xffffffff001a6f08 (ompi_mpi_init + 0x31c) (argc=1,argv=(char **) > >>> 0x000007fefffff348,requested=0,provided=(int *) 0x000007feffffe698) at > >>> line 464 in ompi_mpi_init.c > >>> #9 0xffffffff001ff79c (MPI_Init + 0x2b0) (argc=(int *) > >>> 0x000007feffffe814,argv=(char ***) 0x000007feffffe818) at line 84 in > >>> init.c > >>> #10 0x0000000000100ae4 (main + 0x44) (argc=1,argv=(char **) > >>> 0x000007fefffff348) at line 8 in mpiinitfinalize.c > >>> #11 0xffffffff00d2b81c (__libc_start_main + 0x194) > >>> (0x100aa0,0x1,0x7fefffff348,0x100d24,0x100d14,0x0) > >>> #12 0x000000000010094c (_start + 0x2c) () > >>> > >>> The line 252 in opal/mca/db/hash/db_hash.c is: > >>> > >>> case OPAL_UINT64: > >>> if (NULL == data) { > >>> OPAL_ERROR_LOG(OPAL_ERR_BAD_PARAM); > >>> return OPAL_ERR_BAD_PARAM; > >>> } > >>> kv->type = OPAL_UINT64; > >>> kv->data.uint64 = *(uint64_t*)(data); // !!! here !!! > >>> break; > >>> > >>> My environment is: > >>> > >>> Open MPI v1.8 branch r32447 (latest) > >>> configure --enable-debug > >>> SPARC-V9 (Fujitsu SPARC64 IXfx) > >>> Linux (custom) > >>> gcc 4.2.4 > >>> > >>> I could not reproduce it with Open MPI trunk nor with Fujitsu compiler. > >>> > >>> Can this information help? > >>> > >>> Takahiro Kawashima, > >>> MPI development team, > >>> Fujitsu > >>> > >>>> Hi, > >>>> > >>>> I'm sorry once more to answer late, but the last two days our mail > >>>> server was down (hardware error). > >>>> > >>>>> Did you configure this --enable-debug? > >>>> Yes, I used the following command. > >>>> > >>>> ../openmpi-1.8.2rc3/configure --prefix=/usr/local/openmpi-1.8.2_64_gcc \ > >>>> --libdir=/usr/local/openmpi-1.8.2_64_gcc/lib64 \ > >>>> --with-jdk-bindir=/usr/local/jdk1.8.0/bin \ > >>>> --with-jdk-headers=/usr/local/jdk1.8.0/include \ > >>>> JAVA_HOME=/usr/local/jdk1.8.0 \ > >>>> LDFLAGS="-m64 -L/usr/local/gcc-4.9.0/lib/amd64" \ > >>>> CC="gcc" CXX="g++" FC="gfortran" \ > >>>> CFLAGS="-m64" CXXFLAGS="-m64" FCFLAGS="-m64" \ > >>>> CPP="cpp" CXXCPP="cpp" \ > >>>> CPPFLAGS="" CXXCPPFLAGS="" \ > >>>> --enable-mpi-cxx \ > >>>> --enable-cxx-exceptions \ > >>>> --enable-mpi-java \ > >>>> --enable-heterogeneous \ > >>>> --enable-mpi-thread-multiple \ > >>>> --with-threads=posix \ > >>>> --with-hwloc=internal \ > >>>> --without-verbs \ > >>>> --with-wrapper-cflags="-std=c11 -m64" \ > >>>> --enable-debug \ > >>>> |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_gcc > >>>> > >>>> > >>>> > >>>>> If so, you should get a line number in the backtrace > >>>> I got them for gdb (see below), but not for "dbx". > >>>> > >>>> > >>>> Kind regards > >>>> > >>>> Siegmar > >>>> > >>>> > >>>> > >>>>> On Aug 5, 2014, at 2:59 AM, Siegmar Gross > >>>> <siegmar.gr...@informatik.hs-fulda.de> wrote: > >>>>>> Hi, > >>>>>> > >>>>>> I'm sorry to answer so late, but last week I didn't have Internet > >>>>>> access. In the meantime I've installed openmpi-1.8.2rc3 and I get > >>>>>> the same error. > >>>>>> > >>>>>>> This looks like the typical type of alignment error that we used > >>>>>>> to see when testing regularly on SPARC. :-\ > >>>>>>> > >>>>>>> It looks like the error was happening in mca_db_hash.so. Could > >>>>>>> you get a stack trace / file+line number where it was failing > >>>>>>> in mca_db_hash? (i.e., the actual bad code will likely be under > >>>>>>> opal/mca/db/hash somewhere) > >>>>>> Unfortunately I don't get a file+line number from a file in > >>>>>> opal/mca/db/Hash. > >>>>>> > >>>>>> > >>>>>> > >>>>>> tyr small_prog 102 ompi_info | grep MPI: > >>>>>> Open MPI: 1.8.2rc3 > >>>>>> tyr small_prog 103 which mpicc > >>>>>> /usr/local/openmpi-1.8.2_64_gcc/bin/mpicc > >>>>>> tyr small_prog 104 mpicc init_finalize.c > >>>>>> tyr small_prog 106 /opt/solstudio12.3/bin/sparcv9/dbx > >>>> /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec > >>>>>> For information about new features see `help changes' > >>>>>> To remove this message, put `dbxenv suppress_startup_message 7.9' in > >>>>>> your > >>>> .dbxrc > >>>>>> Reading mpiexec > >>>>>> Reading ld.so.1 > >>>>>> Reading libopen-rte.so.7.0.4 > >>>>>> Reading libopen-pal.so.6.2.0 > >>>>>> Reading libsendfile.so.1 > >>>>>> Reading libpicl.so.1 > >>>>>> Reading libkstat.so.1 > >>>>>> Reading liblgrp.so.1 > >>>>>> Reading libsocket.so.1 > >>>>>> Reading libnsl.so.1 > >>>>>> Reading libgcc_s.so.1 > >>>>>> Reading librt.so.1 > >>>>>> Reading libm.so.2 > >>>>>> Reading libpthread.so.1 > >>>>>> Reading libc.so.1 > >>>>>> Reading libdoor.so.1 > >>>>>> Reading libaio.so.1 > >>>>>> Reading libmd.so.1 > >>>>>> (dbx) check -all > >>>>>> access checking - ON > >>>>>> memuse checking - ON > >>>>>> (dbx) run -np 1 a.outRunning: mpiexec -np 1 a.out > >>>>>> (process id 27833) > >>>>>> Reading rtcapihook.so > >>>>>> Reading libdl.so.1 > >>>>>> Reading rtcaudit.so > >>>>>> Reading libmapmalloc.so.1 > >>>>>> Reading libgen.so.1 > >>>>>> Reading libc_psr.so.1 > >>>>>> Reading rtcboot.so > >>>>>> Reading librtc.so > >>>>>> Reading libmd_psr.so.1 > >>>>>> RTC: Enabling Error Checking... > >>>>>> RTC: Running program... > >>>>>> Write to unallocated (wua) on thread 1: > >>>>>> Attempting to write 1 byte at address 0xffffffff79f04000 > >>>>>> t@1 (l@1) stopped in _readdir at 0xffffffff55174da0 > >>>>>> 0xffffffff55174da0: _readdir+0x0064: call > >>>> _PROCEDURE_LINKAGE_TABLE_+0x2380 [PLT] ! 0xffffffff55342a80 > >>>>>> (dbx) where > >>>>>> current thread: t@1 > >>>>>> =>[1] _readdir(0xffffffff79f00300, 0x2e6800, 0x4, 0x2d, 0x4, > >>>> 0xffffffff79f00300), at 0xffffffff55174da0 > >>>>>> [2] list_files_by_dir(0x100138fd8, 0xffffffff7fffd1f0, > >>>>>> 0xffffffff7fffd1e8, > >>>> 0xffffffff7fffd210, 0x0, 0xffffffff702a0010), at > >>>>>> 0xffffffff63174594 > >>>>>> [3] foreachfile_callback(0x100138fd8, 0xffffffff7fffd458, 0x0, 0x2e, > >>>>>> 0x0, > >>>> 0xffffffff702a0010), at 0xffffffff6317461c > >>>>>> [4] foreach_dirinpath(0x1001d8a28, 0x0, 0xffffffff631745e0, > >>>> 0xffffffff7fffd458, 0x0, 0xffffffff702a0010), at 0xffffffff63171684 > >>>>>> [5] lt_dlforeachfile(0x1001d8a28, 0xffffffff6319656c, 0x0, 0x53, > >>>>>> 0x2f, > >>>> 0xf), at 0xffffffff63174748 > >>>>>> [6] find_dyn_components(0x0, 0xffffffff6323b570, 0x0, 0x1, > >>>> 0xffffffff7fffd6a0, 0xffffffff702a0010), at 0xffffffff63195e38 > >>>>>> [7] mca_base_component_find(0x0, 0xffffffff6323b570, > >>>>>> 0xffffffff6335e1b0, > >>>> 0x0, 0xffffffff7fffd6a0, 0x1), at 0xffffffff631954d8 > >>>>>> [8] mca_base_framework_components_register(0xffffffff6335e1c0, 0x0, > >>>>>> 0x3e, > >>>> 0x0, 0x3b, 0x100800), at 0xffffffff631b1638 > >>>>>> [9] mca_base_framework_register(0xffffffff6335e1c0, 0x0, 0x2, > >>>> 0xffffffff7fffd8d0, 0x0, 0xffffffff702a0010), at 0xffffffff631b24d4 > >>>>>> [10] mca_base_framework_open(0xffffffff6335e1c0, 0x0, 0x2, > >>>> 0xffffffff7fffd990, 0x0, 0xffffffff702a0010), at 0xffffffff631b25d0 > >>>>>> [11] opal_init(0xffffffff7fffdd70, 0xffffffff7fffdd78, 0x100117c60, > >>>> 0xffffffff7fffde58, 0x400, 0x100117c60), at > >>>>>> 0xffffffff63153694 > >>>>>> [12] orterun(0x4, 0xffffffff7fffde58, 0x2, 0xffffffff7fffdda0, 0x0, > >>>> 0xffffffff702a0010), at 0x100005078 > >>>>>> [13] main(0x4, 0xffffffff7fffde58, 0xffffffff7fffde80, 0x100117c60, > >>>> 0x100000000, 0xffffffff6a700200), at 0x100003d68 > >>>>>> (dbx) > >>>>>> > >>>>>> > >>>>>> > >>>>>> I get the following output with gdb. > >>>>>> > >>>>>> tyr small_prog 107 /usr/local/gdb-7.6.1_64_gcc/bin/gdb > >>>> /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec > >>>>>> GNU gdb (GDB) 7.6.1 > >>>>>> Copyright (C) 2013 Free Software Foundation, Inc. > >>>>>> License GPLv3+: GNU GPL version 3 or later > >>>> <http://gnu.org/licenses/gpl.html> > >>>>>> This is free software: you are free to change and redistribute it. > >>>>>> There is NO WARRANTY, to the extent permitted by law. Type "show > >>>>>> copying" > >>>>>> and "show warranty" for details. > >>>>>> This GDB was configured as "sparc-sun-solaris2.10". > >>>>>> For bug reporting instructions, please see: > >>>>>> <http://www.gnu.org/software/gdb/bugs/>... > >>>>>> Reading symbols from > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/bin/orterun...done. > >>>>>> (gdb) run -np 1 a.out > >>>>>> Starting program: /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec -np 1 > >>>>>> a.out > >>>>>> [Thread debugging using libthread_db enabled] > >>>>>> [New Thread 1 (LWP 1)] > >>>>>> [New LWP 2 ] > >>>>>> [tyr:27867] *** Process received signal *** > >>>>>> [tyr:27867] Signal: Bus Error (10) > >>>>>> [tyr:27867] Signal code: Invalid address alignment (1) > >>>>>> [tyr:27867] Failing at address: ffffffff7fffd224 > >>>>>> > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_b > >>>> acktrace_print+0x2c > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:0xccfa > >>>> 0 > >>>>>> /lib/sparcv9/libc.so.1:0xd8b98 > >>>>>> /lib/sparcv9/libc.so.1:0xcc70c > >>>>>> /lib/sparcv9/libc.so.1:0xcc918 > >>>>>> > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_db_hash.so:0x3e > >>>> e8 [ Signal 10 (BUS)] > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_d > >>>> b_base_store+0xc8 > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u > >>>> til_decode_pidmap+0x798 > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u > >>>> til_nidmap_init+0x3cc > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_ess_env.so:0x22 > >>>> 6c > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_i > >>>> nit+0x308 > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:ompi_mpi_in > >>>> it+0x31c > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:PMPI_Init+0 > >>>> x2a8 > >>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/a.out:main+0x20 > >>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/a.out:_start+0x7c > >>>>>> [tyr:27867] *** End of error message *** > >>>>>> -------------------------------------------------------------------------- > >>>>>> mpiexec noticed that process rank 0 with PID 27867 on node tyr exited > >>>>>> on > >>>> signal 10 (Bus Error). > >>>>>> -------------------------------------------------------------------------- > >>>>>> [LWP 2 exited] > >>>>>> [New Thread 2 ] > >>>>>> [Switching to Thread 1 (LWP 1)] > >>>>>> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found > >>>>>> to > >>>> satisfy query > >>>>>> (gdb) bt > >>>>>> #0 0xffffffff7f6173d0 in rtld_db_dlactivity () from > >>>> /usr/lib/sparcv9/ld.so.1 > >>>>>> #1 0xffffffff7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1 > >>>>>> #2 0xffffffff7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1 > >>>>>> #3 0xffffffff7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1 > >>>>>> #4 0xffffffff7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1 > >>>>>> #5 0xffffffff7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1 > >>>>>> #6 0xffffffff7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1 > >>>>>> #7 0xffffffff7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1 > >>>>>> #8 0xffffffff7ec7746c in vm_close () > >>>>>> from /usr/local/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6 > >>>>>> #9 0xffffffff7ec74a4c in lt_dlclose () > >>>>>> from /usr/local/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6 > >>>>>> #10 0xffffffff7ec99b70 in ri_destructor (obj=0x1001ead30) > >>>>>> at > >>>> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_component_repository.c:391 > >>>>>> #11 0xffffffff7ec98488 in opal_obj_run_destructors (object=0x1001ead30) > >>>>>> at ../../../../openmpi-1.8.2rc3/opal/class/opal_object.h:446 > >>>>>> #12 0xffffffff7ec993ec in mca_base_component_repository_release ( > >>>>>> component=0xffffffff7b023cf0 <mca_oob_tcp_component>) > >>>>>> at > >>>> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_component_repository.c:244 > >>>>>> #13 0xffffffff7ec9b734 in mca_base_component_unload ( > >>>>>> component=0xffffffff7b023cf0 <mca_oob_tcp_component>, output_id=-1) > >>>>>> at > >>>> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:47 > >>>>>> #14 0xffffffff7ec9b7c8 in mca_base_component_close ( > >>>>>> component=0xffffffff7b023cf0 <mca_oob_tcp_component>, output_id=-1) > >>>>>> at > >>>> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:60 > >>>>>> #15 0xffffffff7ec9b89c in mca_base_components_close (output_id=-1, > >>>>>> components=0xffffffff7f12b430 <orte_oob_base_framework+80>, > >>>>>> skip=0x0) > >>>>>> ---Type <return> to continue, or q <return> to quit--- > >>>>>> at > >>>> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:86 > >>>>>> #16 0xffffffff7ec9b804 in mca_base_framework_components_close ( > >>>>>> framework=0xffffffff7f12b3e0 <orte_oob_base_framework>, skip=0x0) > >>>>>> at > >>>> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:66 > >>>>>> #17 0xffffffff7efae1e4 in orte_oob_base_close () > >>>>>> at > >>>>>> ../../../../openmpi-1.8.2rc3/orte/mca/oob/base/oob_base_frame.c:94 > >>>>>> #18 0xffffffff7ecb28ac in mca_base_framework_close ( > >>>>>> framework=0xffffffff7f12b3e0 <orte_oob_base_framework>) > >>>>>> at > >>>>>> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_framework.c:187 > >>>>>> #19 0xffffffff7bf078c0 in rte_finalize () > >>>>>> at > >>>>>> ../../../../../openmpi-1.8.2rc3/orte/mca/ess/hnp/ess_hnp_module.c:858 > >>>>>> #20 0xffffffff7ef30a44 in orte_finalize () > >>>>>> at ../../openmpi-1.8.2rc3/orte/runtime/orte_finalize.c:65 > >>>>>> #21 0x00000001000070c4 in orterun (argc=4, argv=0xffffffff7fffe0e8) > >>>>>> at ../../../../openmpi-1.8.2rc3/orte/tools/orterun/orterun.c:1096 > >>>>>> #22 0x0000000100003d70 in main (argc=4, argv=0xffffffff7fffe0e8) > >>>>>> at ../../../../openmpi-1.8.2rc3/orte/tools/orterun/main.c:13 > >>>>>> (gdb) > >>>>>> > >>>>>> > >>>>>> Is the above information helpful to track down the error? Do you need > >>>>>> anything else? Thank you very much for any help in advance. > >>>>>> > >>>>>> > >>>>>> Kind regards > >>>>>> > >>>>>> Siegmar > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> On Jul 25, 2014, at 2:08 AM, Siegmar Gross > >>>> <siegmar.gr...@informatik.hs-fulda.de> wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> I have installed openmpi-1.8.2rc2 with gcc-4.9.0 on Solaris > >>>>>>>> 10 Sparc and I receive a bus error, if I run a small program. > >>>>>>>> > >>>>>>>> tyr hello_1 105 mpiexec -np 2 a.out > >>>>>>>> [tyr:29164] *** Process received signal *** > >>>>>>>> [tyr:29164] Signal: Bus Error (10) > >>>>>>>> [tyr:29164] Signal code: Invalid address alignment (1) > >>>>>>>> [tyr:29164] Failing at address: ffffffff7fffd1c4 > >>>>>>>> > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_b > >>>> acktrace_print+0x2c > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:0xccfd > >>>> 0 > >>>>>>>> /lib/sparcv9/libc.so.1:0xd8b98 > >>>>>>>> /lib/sparcv9/libc.so.1:0xcc70c > >>>>>>>> /lib/sparcv9/libc.so.1:0xcc918 > >>>>>>>> > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_db_hash.so:0x3e > >>>> e8 [ Signal 10 (BUS)] > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_d > >>>> b_base_store+0xc8 > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u > >>>> til_decode_pidmap+0x798 > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u > >>>> til_nidmap_init+0x3cc > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_ess_env.so:0x22 > >>>> 6c > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_i > >>>> nit+0x308 > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:ompi_mpi_in > >>>> it+0x31c > >>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:PMPI_Init+0 > >>>> x2a8 > >>>>>>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/hello_1/a.out:main+0x20 > >>>>>>>> > >>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/hello_1/a.out:_start+0x7c > >>>>>>>> [tyr:29164] *** End of error message *** > >>>>>>>> ... > >>>>>>>> > >>>>>>>> > >>>>>>>> I get the following output if I run the program in "dbx". > >>>>>>>> > >>>>>>>> ... > >>>>>>>> RTC: Enabling Error Checking... > >>>>>>>> RTC: Running program... > >>>>>>>> Write to unallocated (wua) on thread 1: > >>>>>>>> Attempting to write 1 byte at address 0xffffffff79f04000 > >>>>>>>> t@1 (l@1) stopped in _readdir at 0xffffffff55174da0 > >>>>>>>> 0xffffffff55174da0: _readdir+0x0064: call > >>>> _PROCEDURE_LINKAGE_TABLE_+0x2380 [PLT] ! 0xffffffff55342a80 > >>>>>>>> (dbx) > >>>>>>>> > >>>>>>>> > >>>>>>>> Hopefully the above output helps to fix the error. Can I provide > >>>>>>>> anything else? Thank you very much for any help in advance. > >>>>>>>> > >>>>>>>> > >>>>>>>> Kind regards > >>>>>>>> > >>>>>>>> Siegmar