Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

Gilles Gouaillardet Fri, 8 Aug 2014 01:04:53 -0400 (EDT)

Kawashima-san and all,

Here is attached a one off patch for v1.8.
/* it does not use the __attribute__ modifier that might not be
supported by all compilers */


as far as i am concerned, the same issue is also in the trunk,
and if you do not hit it, it just means you are lucky :-)

the same issue might also be in other parts of the code :-(

Cheers,

Gilles

On 2014/08/08 13:45, Kawashima, Takahiro wrote:
> Gilles, George,
>
> The problem is the one Gilles pointed.
> I temporarily modified the code bellow and the bus error disappeared.
>
> --- orte/util/nidmap.c  (revision 32447)
> +++ orte/util/nidmap.c  (working copy)
> @@ -885,7 +885,7 @@
>      orte_proc_state_t state;
>      orte_app_idx_t app_idx;
>      int32_t restarts;
> -    orte_process_name_t proc, dmn;
> +    orte_process_name_t proc __attribute__((__aligned__(8))), dmn;
>      char *hostname;
>      uint8_t flag;
>      opal_buffer_t *bptr;
>
> Takahiro Kawashima,
> MPI development team,
> Fujitsu
>
>> Kawashima-san,
>>
>> This is interesting :-)
>>
>> proc is in the stack and has type orte_process_name_t
>>
>> with
>>
>> typedef uint32_t orte_jobid_t;
>> typedef uint32_t orte_vpid_t;
>> struct orte_process_name_t {
>>     orte_jobid_t jobid;     /**< Job number */
>>     orte_vpid_t vpid;       /**< Process id - equivalent to rank */
>> };
>> typedef struct orte_process_name_t orte_process_name_t;
>>
>>
>> so there is really no reason to align this on 8 bytes...
>> but later, proc is casted into an uint64_t ...
>> so proc should have been aligned on 8 bytes but it is too late,
>> and hence the glory SIGBUS
>>
>>
>> this is loosely related to
>> http://www.open-mpi.org/community/lists/devel/2014/08/15532.php
>> (see heterogeneous.v2.patch)
>> if we make opal_process_name_t an union of uint64_t and a struct of two
>> uint32_t, the compiler
>> will align this on 8 bytes.
>> note the patch is not enough (and will not apply on the v1.8 branch anyway),
>> we could simply remove orte_process_name_t and ompi_process_name_t and
>> use only
>> opal_process_name_t (and never declare variables with type
>> opal_proc_name_t otherwise alignment might be incorrect)
>>
>> as a workaround, you can declare an opal_process_name_t (for alignment),
>> and cast it to an orte_process_name_t
>>
>> i will write a patch (i will not be able to test on sparc ...)
>> please note this issue might be present in other places
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/08/08 13:03, Kawashima, Takahiro wrote:
>>> Hi,
>>>
>>>>>>>> I have installed openmpi-1.8.2rc2 with gcc-4.9.0 on Solaris
>>>>>>>> 10 Sparc and I receive a bus error, if I run a small program.
>>> I've finally reproduced the bus error in my SPARC environment.
>>>
>>> #0 0xffffffff00db4740 (__waitpid_nocancel + 0x44) 
>>> (0x200,0x0,0x0,0xa0,0xfffff80100064af0,0x35b4)
>>> #1 0xffffffff0001a310 (handle_signal + 0x574) (signo=10,info=(struct 
>>> siginfo *) 0x000007feffffd100,p=(void *) 0x000007feffffd100) at line 277 in 
>>> ../sigattach.c <SIGNAL HANDLER>
>>> #2 0xffffffff0282aff4 (store + 0x540) (uid=(unsigned long *) 
>>> 0xffffffff0118a128,scope=8:'\b',key=(char *) 0xffffffff0106a0a8 
>>> "opal.local.ldr",data=(void *) 0x000007feffffde74,type=15:'\017') at line 
>>> 252 in db_hash.c
>>> #3 0xffffffff01266350 (opal_db_base_store + 0xc4) (proc=(unsigned long *) 
>>> 0xffffffff0118a128,scope=8:'\b',key=(char *) 0xffffffff0106a0a8 
>>> "opal.local.ldr",object=(void *) 0x000007feffffde74,type=15:'\017') at line 
>>> 49 in db_base_fns.c
>>> #4 0xffffffff00fdbab4 (orte_util_decode_pidmap + 0x790) (bo=(struct *) 
>>> 0x0000000000281d70) at line 975 in nidmap.c
>>> #5 0xffffffff00fd6d20 (orte_util_nidmap_init + 0x3dc) (buffer=(struct 
>>> opal_buffer_t *) 0x0000000000241fc0) at line 141 in nidmap.c
>>> #6 0xffffffff01e298cc (rte_init + 0x2a0) () at line 153 in ess_env_module.c
>>> #7 0xffffffff00f9f28c (orte_init + 0x308) (pargc=(int *) 
>>> 0x0000000000000000,pargv=(char ***) 0x0000000000000000,flags=32) at line 
>>> 148 in orte_init.c
>>> #8 0xffffffff001a6f08 (ompi_mpi_init + 0x31c) (argc=1,argv=(char **) 
>>> 0x000007fefffff348,requested=0,provided=(int *) 0x000007feffffe698) at line 
>>> 464 in ompi_mpi_init.c
>>> #9 0xffffffff001ff79c (MPI_Init + 0x2b0) (argc=(int *) 
>>> 0x000007feffffe814,argv=(char ***) 0x000007feffffe818) at line 84 in init.c
>>> #10 0x0000000000100ae4 (main + 0x44) (argc=1,argv=(char **) 
>>> 0x000007fefffff348) at line 8 in mpiinitfinalize.c
>>> #11 0xffffffff00d2b81c (__libc_start_main + 0x194) 
>>> (0x100aa0,0x1,0x7fefffff348,0x100d24,0x100d14,0x0)
>>> #12 0x000000000010094c (_start + 0x2c) ()
>>>
>>> The line 252 in opal/mca/db/hash/db_hash.c is:
>>>
>>>     case OPAL_UINT64:
>>>         if (NULL == data) {
>>>             OPAL_ERROR_LOG(OPAL_ERR_BAD_PARAM);
>>>             return OPAL_ERR_BAD_PARAM;
>>>         }
>>>         kv->type = OPAL_UINT64;
>>>         kv->data.uint64 = *(uint64_t*)(data); // !!! here !!!
>>>         break;
>>>
>>> My environment is:
>>>
>>>   Open MPI v1.8 branch r32447 (latest)
>>>   configure --enable-debug
>>>   SPARC-V9 (Fujitsu SPARC64 IXfx)
>>>   Linux (custom)
>>>   gcc 4.2.4
>>>
>>> I could not reproduce it with Open MPI trunk nor with Fujitsu compiler.
>>>
>>> Can this information help?
>>>
>>> Takahiro Kawashima,
>>> MPI development team,
>>> Fujitsu
>>>
>>>> Hi,
>>>>
>>>> I'm sorry once more to answer late, but the last two days our mail
>>>> server was down (hardware error).
>>>>
>>>>> Did you configure this --enable-debug?
>>>> Yes, I used the following command.
>>>>
>>>> ../openmpi-1.8.2rc3/configure --prefix=/usr/local/openmpi-1.8.2_64_gcc \
>>>>   --libdir=/usr/local/openmpi-1.8.2_64_gcc/lib64 \
>>>>   --with-jdk-bindir=/usr/local/jdk1.8.0/bin \
>>>>   --with-jdk-headers=/usr/local/jdk1.8.0/include \
>>>>   JAVA_HOME=/usr/local/jdk1.8.0 \
>>>>   LDFLAGS="-m64 -L/usr/local/gcc-4.9.0/lib/amd64" \
>>>>   CC="gcc" CXX="g++" FC="gfortran" \
>>>>   CFLAGS="-m64" CXXFLAGS="-m64" FCFLAGS="-m64" \
>>>>   CPP="cpp" CXXCPP="cpp" \
>>>>   CPPFLAGS="" CXXCPPFLAGS="" \
>>>>   --enable-mpi-cxx \
>>>>   --enable-cxx-exceptions \
>>>>   --enable-mpi-java \
>>>>   --enable-heterogeneous \
>>>>   --enable-mpi-thread-multiple \
>>>>   --with-threads=posix \
>>>>   --with-hwloc=internal \
>>>>   --without-verbs \
>>>>   --with-wrapper-cflags="-std=c11 -m64" \
>>>>   --enable-debug \
>>>>   |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_gcc
>>>>
>>>>
>>>>
>>>>> If so, you should get a line number in the backtrace
>>>> I got them for gdb (see below), but not for "dbx".
>>>>
>>>>
>>>> Kind regards
>>>>
>>>> Siegmar
>>>>
>>>>
>>>>
>>>>> On Aug 5, 2014, at 2:59 AM, Siegmar Gross 
>>>> <[email protected]> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm sorry to answer so late, but last week I didn't have Internet
>>>>>> access. In the meantime I've installed openmpi-1.8.2rc3 and I get
>>>>>> the same error.
>>>>>>
>>>>>>> This looks like the typical type of alignment error that we used
>>>>>>> to see when testing regularly on SPARC.  :-\
>>>>>>>
>>>>>>> It looks like the error was happening in mca_db_hash.so.  Could
>>>>>>> you get a stack trace / file+line number where it was failing
>>>>>>> in mca_db_hash?  (i.e., the actual bad code will likely be under
>>>>>>> opal/mca/db/hash somewhere)
>>>>>> Unfortunately I don't get a file+line number from a file in
>>>>>> opal/mca/db/Hash.
>>>>>>
>>>>>>
>>>>>>
>>>>>> tyr small_prog 102 ompi_info | grep MPI:
>>>>>>                Open MPI: 1.8.2rc3
>>>>>> tyr small_prog 103 which mpicc
>>>>>> /usr/local/openmpi-1.8.2_64_gcc/bin/mpicc
>>>>>> tyr small_prog 104 mpicc init_finalize.c 
>>>>>> tyr small_prog 106 /opt/solstudio12.3/bin/sparcv9/dbx 
>>>> /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec 
>>>>>> For information about new features see `help changes'
>>>>>> To remove this message, put `dbxenv suppress_startup_message 7.9' in 
>>>>>> your 
>>>> .dbxrc
>>>>>> Reading mpiexec
>>>>>> Reading ld.so.1
>>>>>> Reading libopen-rte.so.7.0.4
>>>>>> Reading libopen-pal.so.6.2.0
>>>>>> Reading libsendfile.so.1
>>>>>> Reading libpicl.so.1
>>>>>> Reading libkstat.so.1
>>>>>> Reading liblgrp.so.1
>>>>>> Reading libsocket.so.1
>>>>>> Reading libnsl.so.1
>>>>>> Reading libgcc_s.so.1
>>>>>> Reading librt.so.1
>>>>>> Reading libm.so.2
>>>>>> Reading libpthread.so.1
>>>>>> Reading libc.so.1
>>>>>> Reading libdoor.so.1
>>>>>> Reading libaio.so.1
>>>>>> Reading libmd.so.1
>>>>>> (dbx) check -all
>>>>>> access checking - ON
>>>>>> memuse checking - ON
>>>>>> (dbx) run -np 1 a.outRunning: mpiexec -np 1 a.out 
>>>>>> (process id 27833)
>>>>>> Reading rtcapihook.so
>>>>>> Reading libdl.so.1
>>>>>> Reading rtcaudit.so
>>>>>> Reading libmapmalloc.so.1
>>>>>> Reading libgen.so.1
>>>>>> Reading libc_psr.so.1
>>>>>> Reading rtcboot.so
>>>>>> Reading librtc.so
>>>>>> Reading libmd_psr.so.1
>>>>>> RTC: Enabling Error Checking...
>>>>>> RTC: Running program...
>>>>>> Write to unallocated (wua) on thread 1:
>>>>>> Attempting to write 1 byte at address 0xffffffff79f04000
>>>>>> t@1 (l@1) stopped in _readdir at 0xffffffff55174da0
>>>>>> 0xffffffff55174da0: _readdir+0x0064:    call     
>>>> _PROCEDURE_LINKAGE_TABLE_+0x2380 [PLT] ! 0xffffffff55342a80
>>>>>> (dbx) where
>>>>>> current thread: t@1
>>>>>> =>[1] _readdir(0xffffffff79f00300, 0x2e6800, 0x4, 0x2d, 0x4, 
>>>> 0xffffffff79f00300), at 0xffffffff55174da0 
>>>>>>  [2] list_files_by_dir(0x100138fd8, 0xffffffff7fffd1f0, 
>>>>>> 0xffffffff7fffd1e8, 
>>>> 0xffffffff7fffd210, 0x0, 0xffffffff702a0010), at 
>>>>>> 0xffffffff63174594 
>>>>>>  [3] foreachfile_callback(0x100138fd8, 0xffffffff7fffd458, 0x0, 0x2e, 
>>>>>> 0x0, 
>>>> 0xffffffff702a0010), at 0xffffffff6317461c 
>>>>>>  [4] foreach_dirinpath(0x1001d8a28, 0x0, 0xffffffff631745e0, 
>>>> 0xffffffff7fffd458, 0x0, 0xffffffff702a0010), at 0xffffffff63171684 
>>>>>>  [5] lt_dlforeachfile(0x1001d8a28, 0xffffffff6319656c, 0x0, 0x53, 0x2f, 
>>>> 0xf), at 0xffffffff63174748 
>>>>>>  [6] find_dyn_components(0x0, 0xffffffff6323b570, 0x0, 0x1, 
>>>> 0xffffffff7fffd6a0, 0xffffffff702a0010), at 0xffffffff63195e38 
>>>>>>  [7] mca_base_component_find(0x0, 0xffffffff6323b570, 
>>>>>> 0xffffffff6335e1b0, 
>>>> 0x0, 0xffffffff7fffd6a0, 0x1), at 0xffffffff631954d8 
>>>>>>  [8] mca_base_framework_components_register(0xffffffff6335e1c0, 0x0, 
>>>>>> 0x3e, 
>>>> 0x0, 0x3b, 0x100800), at 0xffffffff631b1638 
>>>>>>  [9] mca_base_framework_register(0xffffffff6335e1c0, 0x0, 0x2, 
>>>> 0xffffffff7fffd8d0, 0x0, 0xffffffff702a0010), at 0xffffffff631b24d4 
>>>>>>  [10] mca_base_framework_open(0xffffffff6335e1c0, 0x0, 0x2, 
>>>> 0xffffffff7fffd990, 0x0, 0xffffffff702a0010), at 0xffffffff631b25d0 
>>>>>>  [11] opal_init(0xffffffff7fffdd70, 0xffffffff7fffdd78, 0x100117c60, 
>>>> 0xffffffff7fffde58, 0x400, 0x100117c60), at 
>>>>>> 0xffffffff63153694 
>>>>>>  [12] orterun(0x4, 0xffffffff7fffde58, 0x2, 0xffffffff7fffdda0, 0x0, 
>>>> 0xffffffff702a0010), at 0x100005078 
>>>>>>  [13] main(0x4, 0xffffffff7fffde58, 0xffffffff7fffde80, 0x100117c60, 
>>>> 0x100000000, 0xffffffff6a700200), at 0x100003d68 
>>>>>> (dbx) 
>>>>>>
>>>>>>
>>>>>>
>>>>>> I get the following output with gdb.
>>>>>>
>>>>>> tyr small_prog 107 /usr/local/gdb-7.6.1_64_gcc/bin/gdb 
>>>> /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec 
>>>>>> GNU gdb (GDB) 7.6.1
>>>>>> Copyright (C) 2013 Free Software Foundation, Inc.
>>>>>> License GPLv3+: GNU GPL version 3 or later 
>>>> <http://gnu.org/licenses/gpl.html>
>>>>>> This is free software: you are free to change and redistribute it.
>>>>>> There is NO WARRANTY, to the extent permitted by law.  Type "show 
>>>>>> copying"
>>>>>> and "show warranty" for details.
>>>>>> This GDB was configured as "sparc-sun-solaris2.10".
>>>>>> For bug reporting instructions, please see:
>>>>>> <http://www.gnu.org/software/gdb/bugs/>...
>>>>>> Reading symbols from 
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/bin/orterun...done.
>>>>>> (gdb) run -np 1 a.out
>>>>>> Starting program: /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec -np 1 a.out
>>>>>> [Thread debugging using libthread_db enabled]
>>>>>> [New Thread 1 (LWP 1)]
>>>>>> [New LWP    2        ]
>>>>>> [tyr:27867] *** Process received signal ***
>>>>>> [tyr:27867] Signal: Bus Error (10)
>>>>>> [tyr:27867] Signal code: Invalid address alignment (1)
>>>>>> [tyr:27867] Failing at address: ffffffff7fffd224
>>>>>>
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_b
>>>> acktrace_print+0x2c
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:0xccfa
>>>> 0
>>>>>> /lib/sparcv9/libc.so.1:0xd8b98
>>>>>> /lib/sparcv9/libc.so.1:0xcc70c
>>>>>> /lib/sparcv9/libc.so.1:0xcc918
>>>>>>
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_db_hash.so:0x3e
>>>> e8 [ Signal 10 (BUS)]
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_d
>>>> b_base_store+0xc8
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u
>>>> til_decode_pidmap+0x798
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u
>>>> til_nidmap_init+0x3cc
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_ess_env.so:0x22
>>>> 6c
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_i
>>>> nit+0x308
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:ompi_mpi_in
>>>> it+0x31c
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:PMPI_Init+0
>>>> x2a8
>>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/a.out:main+0x20
>>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/a.out:_start+0x7c
>>>>>> [tyr:27867] *** End of error message ***
>>>>>> --------------------------------------------------------------------------
>>>>>> mpiexec noticed that process rank 0 with PID 27867 on node tyr exited on 
>>>> signal 10 (Bus Error).
>>>>>> --------------------------------------------------------------------------
>>>>>> [LWP    2         exited]
>>>>>> [New Thread 2        ]
>>>>>> [Switching to Thread 1 (LWP 1)]
>>>>>> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to 
>>>> satisfy query
>>>>>> (gdb) bt
>>>>>> #0  0xffffffff7f6173d0 in rtld_db_dlactivity () from 
>>>> /usr/lib/sparcv9/ld.so.1
>>>>>> #1  0xffffffff7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1
>>>>>> #2  0xffffffff7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1
>>>>>> #3  0xffffffff7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1
>>>>>> #4  0xffffffff7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1
>>>>>> #5  0xffffffff7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1
>>>>>> #6  0xffffffff7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1
>>>>>> #7  0xffffffff7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1
>>>>>> #8  0xffffffff7ec7746c in vm_close ()
>>>>>>   from /usr/local/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6
>>>>>> #9  0xffffffff7ec74a4c in lt_dlclose ()
>>>>>>   from /usr/local/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6
>>>>>> #10 0xffffffff7ec99b70 in ri_destructor (obj=0x1001ead30)
>>>>>>    at 
>>>> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_component_repository.c:391
>>>>>> #11 0xffffffff7ec98488 in opal_obj_run_destructors (object=0x1001ead30)
>>>>>>    at ../../../../openmpi-1.8.2rc3/opal/class/opal_object.h:446
>>>>>> #12 0xffffffff7ec993ec in mca_base_component_repository_release (
>>>>>>    component=0xffffffff7b023cf0 <mca_oob_tcp_component>)
>>>>>>    at 
>>>> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_component_repository.c:244
>>>>>> #13 0xffffffff7ec9b734 in mca_base_component_unload (
>>>>>>    component=0xffffffff7b023cf0 <mca_oob_tcp_component>, output_id=-1)
>>>>>>    at 
>>>> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:47
>>>>>> #14 0xffffffff7ec9b7c8 in mca_base_component_close (
>>>>>>    component=0xffffffff7b023cf0 <mca_oob_tcp_component>, output_id=-1)
>>>>>>    at 
>>>> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:60
>>>>>> #15 0xffffffff7ec9b89c in mca_base_components_close (output_id=-1, 
>>>>>>    components=0xffffffff7f12b430 <orte_oob_base_framework+80>, skip=0x0)
>>>>>> ---Type <return> to continue, or q <return> to quit---
>>>>>>    at 
>>>> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:86
>>>>>> #16 0xffffffff7ec9b804 in mca_base_framework_components_close (
>>>>>>    framework=0xffffffff7f12b3e0 <orte_oob_base_framework>, skip=0x0)
>>>>>>    at 
>>>> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:66
>>>>>> #17 0xffffffff7efae1e4 in orte_oob_base_close ()
>>>>>>    at ../../../../openmpi-1.8.2rc3/orte/mca/oob/base/oob_base_frame.c:94
>>>>>> #18 0xffffffff7ecb28ac in mca_base_framework_close (
>>>>>>    framework=0xffffffff7f12b3e0 <orte_oob_base_framework>)
>>>>>>    at ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_framework.c:187
>>>>>> #19 0xffffffff7bf078c0 in rte_finalize ()
>>>>>>    at 
>>>>>> ../../../../../openmpi-1.8.2rc3/orte/mca/ess/hnp/ess_hnp_module.c:858
>>>>>> #20 0xffffffff7ef30a44 in orte_finalize ()
>>>>>>    at ../../openmpi-1.8.2rc3/orte/runtime/orte_finalize.c:65
>>>>>> #21 0x00000001000070c4 in orterun (argc=4, argv=0xffffffff7fffe0e8)
>>>>>>    at ../../../../openmpi-1.8.2rc3/orte/tools/orterun/orterun.c:1096
>>>>>> #22 0x0000000100003d70 in main (argc=4, argv=0xffffffff7fffe0e8)
>>>>>>    at ../../../../openmpi-1.8.2rc3/orte/tools/orterun/main.c:13
>>>>>> (gdb) 
>>>>>>
>>>>>>
>>>>>> Is the above information helpful to track down the error? Do you need
>>>>>> anything else? Thank you very much for any help in advance.
>>>>>>
>>>>>>
>>>>>> Kind regards
>>>>>>
>>>>>> Siegmar
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Jul 25, 2014, at 2:08 AM, Siegmar Gross 
>>>> <[email protected]> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have installed openmpi-1.8.2rc2 with gcc-4.9.0 on Solaris
>>>>>>>> 10 Sparc and I receive a bus error, if I run a small program.
>>>>>>>>
>>>>>>>> tyr hello_1 105 mpiexec -np 2 a.out 
>>>>>>>> [tyr:29164] *** Process received signal ***
>>>>>>>> [tyr:29164] Signal: Bus Error (10)
>>>>>>>> [tyr:29164] Signal code: Invalid address alignment (1)
>>>>>>>> [tyr:29164] Failing at address: ffffffff7fffd1c4
>>>>>>>>
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_b
>>>> acktrace_print+0x2c
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:0xccfd
>>>> 0
>>>>>>>> /lib/sparcv9/libc.so.1:0xd8b98
>>>>>>>> /lib/sparcv9/libc.so.1:0xcc70c
>>>>>>>> /lib/sparcv9/libc.so.1:0xcc918
>>>>>>>>
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_db_hash.so:0x3e
>>>> e8 [ Signal 10 (BUS)]
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_d
>>>> b_base_store+0xc8
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u
>>>> til_decode_pidmap+0x798
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u
>>>> til_nidmap_init+0x3cc
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_ess_env.so:0x22
>>>> 6c
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_i
>>>> nit+0x308
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:ompi_mpi_in
>>>> it+0x31c
>>>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:PMPI_Init+0
>>>> x2a8
>>>>>>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/hello_1/a.out:main+0x20
>>>>>>>>
>>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/hello_1/a.out:_start+0x7c
>>>>>>>> [tyr:29164] *** End of error message ***
>>>>>>>> ...
>>>>>>>>
>>>>>>>>
>>>>>>>> I get the following output if I run the program in "dbx".
>>>>>>>>
>>>>>>>> ...
>>>>>>>> RTC: Enabling Error Checking...
>>>>>>>> RTC: Running program...
>>>>>>>> Write to unallocated (wua) on thread 1:
>>>>>>>> Attempting to write 1 byte at address 0xffffffff79f04000
>>>>>>>> t@1 (l@1) stopped in _readdir at 0xffffffff55174da0
>>>>>>>> 0xffffffff55174da0: _readdir+0x0064:    call     
>>>> _PROCEDURE_LINKAGE_TABLE_+0x2380 [PLT] ! 0xffffffff55342a80
>>>>>>>> (dbx) 
>>>>>>>>
>>>>>>>>
>>>>>>>> Hopefully the above output helps to fix the error. Can I provide
>>>>>>>> anything else? Thank you very much for any help in advance.
>>>>>>>>
>>>>>>>>
>>>>>>>> Kind regards
>>>>>>>>
>>>>>>>> Siegmar
> _______________________________________________
> devel mailing list
> [email protected]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15546.php

Index: orte/util/nidmap.c
===================================================================
--- orte/util/nidmap.c  (revision 32449)
+++ orte/util/nidmap.c  (working copy)
@@ -13,7 +13,6 @@
  * Copyright (c) 2012-2014 Los Alamos National Security, LLC.
  *                         All rights reserved.
  * Copyright (c) 2013      Intel, Inc. All rights reserved
- *
  * Copyright (c) 2014      Research Organization for Information Science
  *                         and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
@@ -171,7 +170,9 @@
     int rc;
     struct hostent *h;
     opal_buffer_t buf;
-    orte_process_name_t proc;
+    /* FIXME make sure the orte_process_name_t is 8 bytes aligned */
+    opal_identifier_t _proc;
+    orte_process_name_t *proc = (orte_process_name_t *)&_proc;
     char *uri, *addr;
     char *proc_name;

@@ -192,15 +193,15 @@
      */

     /* install the entry for the HNP */
-    proc.jobid = ORTE_PROC_MY_NAME->jobid;
-    proc.vpid = 0;
-    if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)&proc, 
OPAL_SCOPE_INTERNAL,
-                                            ORTE_DB_DAEMON_VPID, &proc.vpid, 
OPAL_UINT32))) {
+    proc->jobid = ORTE_PROC_MY_NAME->jobid;
+    proc->vpid = 0;
+    if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)proc, 
OPAL_SCOPE_INTERNAL,
+                                            ORTE_DB_DAEMON_VPID, &proc->vpid, 
OPAL_UINT32))) {
         ORTE_ERROR_LOG(rc);
         return rc;
     }
     addr = "HNP";
-    if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)&proc, 
OPAL_SCOPE_INTERNAL,
+    if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)proc, 
OPAL_SCOPE_INTERNAL,
                                             ORTE_DB_HOSTNAME, addr, 
OPAL_STRING))) {
         ORTE_ERROR_LOG(rc);
         return rc;
@@ -213,9 +214,9 @@
     OBJ_CONSTRUCT(&buf, opal_buffer_t);
     for (i=0; i < num_nodes; i++) {
         /* define the vpid for this daemon */
-        proc.vpid = i+1;
+        proc->vpid = i+1;
         /* store the hostname for the proc */
-        if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)&proc, 
OPAL_SCOPE_INTERNAL,
+        if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)proc, 
OPAL_SCOPE_INTERNAL,
                                                 ORTE_DB_HOSTNAME, nodes[i], 
OPAL_STRING))) {
             ORTE_ERROR_LOG(rc);
             return rc;
@@ -223,7 +224,7 @@
         /* the arch defaults to our arch so that non-hetero
          * case will yield correct behavior
          */
-        if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)&proc, 
OPAL_SCOPE_INTERNAL,
+        if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)proc, 
OPAL_SCOPE_INTERNAL,
                                                 ORTE_DB_ARCH, 
&opal_local_arch, OPAL_UINT32))) {
             ORTE_ERROR_LOG(rc);
             return rc;
@@ -244,7 +245,7 @@
          */

         /* construct the URI */
-        orte_util_convert_process_name_to_string(&proc_name, &proc);
+        orte_util_convert_process_name_to_string(&proc_name, proc);
         asprintf(&uri, "%s;tcp://%s:%d", proc_name, addr, 
(int)orte_process_info.my_port);
         OPAL_OUTPUT_VERBOSE((2, orte_nidmap_output,
                              "%s orte:util:build:daemon:nidmap node %s daemon 
%d addr %s uri %s",
@@ -392,7 +393,9 @@
 {
     int n;
     orte_vpid_t num_daemons;
-    orte_process_name_t daemon;
+    /* FIXME make sure the orte_process_name_t is 8 bytes aligned */
+    opal_identifier_t _daemon;
+    orte_process_name_t *daemon = (orte_process_name_t *)&_daemon;
     opal_buffer_t buf;
     int rc=ORTE_SUCCESS;
     uint8_t oversub;
@@ -432,10 +435,10 @@
     }

     /* set the daemon jobid */
-    daemon.jobid = ORTE_DAEMON_JOBID(ORTE_PROC_MY_NAME->jobid);
+    daemon->jobid = ORTE_DAEMON_JOBID(ORTE_PROC_MY_NAME->jobid);

     n=1;
-    while (OPAL_SUCCESS == (rc = opal_dss.unpack(&buf, &daemon.vpid, &n, 
ORTE_VPID))) {
+    while (OPAL_SUCCESS == (rc = opal_dss.unpack(&buf, &daemon->vpid, &n, 
ORTE_VPID))) {
         /* unpack and store the node's name */
         n=1;
         if (ORTE_SUCCESS != (rc = opal_dss.unpack(&buf, &nodename, &n, 
OPAL_STRING))) {
@@ -443,7 +446,7 @@
             return rc;
         }
         /* we only need the hostname for our own error messages, so mark it as 
internal */
-        if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)&daemon, 
OPAL_SCOPE_INTERNAL,
+        if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)daemon, 
OPAL_SCOPE_INTERNAL,
                                                 ORTE_DB_HOSTNAME, nodename, 
OPAL_STRING))) {
             ORTE_ERROR_LOG(rc);
             return rc;
@@ -452,9 +455,9 @@
         opal_output_verbose(2, orte_nidmap_output,
                             "%s storing nodename %s for daemon %s",
                             ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
-                            nodename, ORTE_VPID_PRINT(daemon.vpid));
+                            nodename, ORTE_VPID_PRINT(daemon->vpid));
         if (ORTE_SUCCESS != (rc = 
opal_db.store((opal_identifier_t*)ORTE_NAME_WILDCARD, OPAL_SCOPE_INTERNAL,
-                                                nodename, &daemon.vpid, 
OPAL_UINT32))) {
+                                                nodename, &daemon->vpid, 
OPAL_UINT32))) {
             ORTE_ERROR_LOG(rc);
             return rc;
         }
@@ -462,10 +465,10 @@
         OPAL_OUTPUT_VERBOSE((2, orte_nidmap_output,
                              "%s orte:util:decode:nidmap daemon %s node %s",
                              ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
-                             ORTE_VPID_PRINT(daemon.vpid), nodename));
+                             ORTE_VPID_PRINT(daemon->vpid), nodename));

         /* if this is my daemon, then store the data for me too */
-        if (daemon.vpid == ORTE_PROC_MY_DAEMON->vpid) {
+        if (daemon->vpid == ORTE_PROC_MY_DAEMON->vpid) {
             if (ORTE_SUCCESS != (rc = 
opal_db.store((opal_identifier_t*)ORTE_PROC_MY_NAME, OPAL_SCOPE_NON_PEER,
                                                     ORTE_DB_HOSTNAME, 
nodename, OPAL_STRING))) {
                 ORTE_ERROR_LOG(rc);
@@ -473,7 +476,7 @@
             }
             /* we may need our daemon vpid to be shared with non-peers */
             if (ORTE_SUCCESS != (rc = 
opal_db.store((opal_identifier_t*)ORTE_PROC_MY_NAME, OPAL_SCOPE_NON_PEER,
-                                                    ORTE_DB_DAEMON_VPID, 
&daemon.vpid, OPAL_UINT32))) {
+                                                    ORTE_DB_DAEMON_VPID, 
&daemon->vpid, OPAL_UINT32))) {
                 ORTE_ERROR_LOG(rc);
                 return rc;
             }
@@ -498,9 +501,9 @@
                 opal_output_verbose(2, orte_nidmap_output,
                                     "%s storing alias %s for daemon %s",
                                     ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
-                                    alias, ORTE_VPID_PRINT(daemon.vpid));
+                                    alias, ORTE_VPID_PRINT(daemon->vpid));
                 if (ORTE_SUCCESS != (rc = 
opal_db.store((opal_identifier_t*)ORTE_NAME_WILDCARD, OPAL_SCOPE_INTERNAL,
-                                                        alias, &daemon.vpid, 
OPAL_UINT32))) {
+                                                        alias, &daemon->vpid, 
OPAL_UINT32))) {
                     ORTE_ERROR_LOG(rc);
                     return rc;
                 }
@@ -524,13 +527,13 @@
                 ORTE_ERROR_LOG(rc);
                 return rc;
             }
-            if (ORTE_SUCCESS != (rc = 
opal_db.store((opal_identifier_t*)&daemon, OPAL_SCOPE_NON_PEER,
+            if (ORTE_SUCCESS != (rc = 
opal_db.store((opal_identifier_t*)daemon, OPAL_SCOPE_NON_PEER,
                                                     ORTE_DB_HOSTID, &hostid, 
OPAL_UINT32))) {
                 ORTE_ERROR_LOG(rc);
                 return rc;
             }
             /* if this is my daemon, then store it as my hostid as well */
-            if (daemon.vpid == ORTE_PROC_MY_DAEMON->vpid) {
+            if (daemon->vpid == ORTE_PROC_MY_DAEMON->vpid) {
                 if (ORTE_SUCCESS != (rc = 
opal_db.store((opal_identifier_t*)ORTE_PROC_MY_NAME, OPAL_SCOPE_NON_PEER,
                                                         ORTE_DB_HOSTID, 
&hostid, OPAL_UINT32))) {
                     ORTE_ERROR_LOG(rc);
@@ -885,7 +888,10 @@
     orte_proc_state_t state;
     orte_app_idx_t app_idx;
     int32_t restarts;
-    orte_process_name_t proc, dmn;
+    /* FIXME make sure the orte_process_name_t is 8 bytes aligned */
+    opal_identifier_t _proc, _dmn;
+    orte_process_name_t *proc = (orte_process_name_t *)&_proc;
+    orte_process_name_t *dmn = (orte_process_name_t *)&_dmn;
     char *hostname;
     uint8_t flag;
     opal_buffer_t *bptr;
@@ -899,16 +905,16 @@
     }

     /* set the daemon jobid */
-    dmn.jobid = ORTE_DAEMON_JOBID(ORTE_PROC_MY_NAME->jobid);
+    dmn->jobid = ORTE_DAEMON_JOBID(ORTE_PROC_MY_NAME->jobid);

     n = 1;
     /* cycle through the buffer */
     orte_process_info.num_local_peers = 0;
-    while (ORTE_SUCCESS == (rc = opal_dss.unpack(&buf, &proc.jobid, &n, 
ORTE_JOBID))) {
+    while (ORTE_SUCCESS == (rc = opal_dss.unpack(&buf, &proc->jobid, &n, 
ORTE_JOBID))) {
         OPAL_OUTPUT_VERBOSE((2, orte_nidmap_output,
                              "%s orte:util:decode:pidmap working job %s",
                              ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
-                             ORTE_JOBID_PRINT(proc.jobid)));
+                             ORTE_JOBID_PRINT(proc->jobid)));

         /* unpack and store the number of procs */
         n=1;
@@ -916,9 +922,9 @@
             ORTE_ERROR_LOG(rc);
             goto cleanup;
         }
-        proc.vpid = ORTE_VPID_INVALID;
+        proc->vpid = ORTE_VPID_INVALID;
         /* only useful to ourselves */
-        if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)&proc, 
OPAL_SCOPE_INTERNAL,
+        if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)proc, 
OPAL_SCOPE_INTERNAL,
                                                 ORTE_DB_NPROCS, &num_procs, 
OPAL_UINT32))) {
             ORTE_ERROR_LOG(rc);
             goto cleanup;
@@ -930,7 +936,7 @@
             goto cleanup;
         }
         /* only of possible use to ourselves */
-        if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)&proc, 
OPAL_SCOPE_INTERNAL,
+        if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)proc, 
OPAL_SCOPE_INTERNAL,
                                                 ORTE_DB_NPROC_OFFSET, &offset, 
OPAL_UINT32))) {
             ORTE_ERROR_LOG(rc);
             goto cleanup;
@@ -939,12 +945,12 @@
          * all data for this job has been read
          */
         n=1;
-        while (OPAL_SUCCESS == (rc = opal_dss.unpack(&buf, &proc.vpid, &n, 
ORTE_VPID))) {
-            if (ORTE_VPID_INVALID == proc.vpid) {
+        while (OPAL_SUCCESS == (rc = opal_dss.unpack(&buf, &proc->vpid, &n, 
ORTE_VPID))) {
+            if (ORTE_VPID_INVALID == proc->vpid) {
                 break;
             }
             n=1;
-            if (ORTE_SUCCESS != (rc = opal_dss.unpack(&buf, &dmn.vpid, &n, 
ORTE_VPID))) {
+            if (ORTE_SUCCESS != (rc = opal_dss.unpack(&buf, &dmn->vpid, &n, 
ORTE_VPID))) {
                 ORTE_ERROR_LOG(rc);
                 goto cleanup;
             }
@@ -965,15 +971,15 @@
                 goto cleanup;
             }
 #endif
-            if (proc.jobid == ORTE_PROC_MY_NAME->jobid &&
-                proc.vpid == ORTE_PROC_MY_NAME->vpid) {
+            if (proc->jobid == ORTE_PROC_MY_NAME->jobid &&
+                proc->vpid == ORTE_PROC_MY_NAME->vpid) {
                 /* set mine */
                 orte_process_info.my_local_rank = local_rank;
                 orte_process_info.my_node_rank = node_rank;
                 /* if we are the local leader (i.e., local_rank=0), then 
record it */
                 if (0 == local_rank) {
                     if (ORTE_SUCCESS != (rc = 
opal_db.store((opal_identifier_t*)ORTE_PROC_MY_NAME, OPAL_SCOPE_INTERNAL,
-                                                            OPAL_DB_LOCALLDR, 
(opal_identifier_t*)&proc, OPAL_ID_T))) {
+                                                            OPAL_DB_LOCALLDR, 
(opal_identifier_t*)proc, OPAL_ID_T))) {
                         ORTE_ERROR_LOG(rc);
                         goto cleanup;
                     }
@@ -983,14 +989,14 @@
                     orte_process_info.cpuset = strdup(cpu_bitmap);
                 }
 #endif
-            } else if (proc.jobid == ORTE_PROC_MY_NAME->jobid &&
-                       dmn.vpid == ORTE_PROC_MY_DAEMON->vpid) {
+            } else if (proc->jobid == ORTE_PROC_MY_NAME->jobid &&
+                       dmn->vpid == ORTE_PROC_MY_DAEMON->vpid) {
                 /* if we share a daemon, then add to my local peers */
                 orte_process_info.num_local_peers++;
                 /* if this is the local leader (i.e., local_rank=0), then 
record it */
                 if (0 == local_rank) {
                     if (ORTE_SUCCESS != (rc = 
opal_db.store((opal_identifier_t*)ORTE_PROC_MY_NAME, OPAL_SCOPE_INTERNAL,
-                                                            OPAL_DB_LOCALLDR, 
(opal_identifier_t*)&proc, OPAL_ID_T))) {
+                                                            OPAL_DB_LOCALLDR, 
(opal_identifier_t*)proc, OPAL_ID_T))) {
                         ORTE_ERROR_LOG(rc);
                         goto cleanup;
                     }
@@ -1020,18 +1026,18 @@
                 goto cleanup;
             }
             /* store the values in the database - again, these are for our own 
internal use */
-            if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)&proc, 
OPAL_SCOPE_INTERNAL,
+            if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)proc, 
OPAL_SCOPE_INTERNAL,
                                                     ORTE_DB_LOCALRANK, 
&local_rank, ORTE_LOCAL_RANK))) {
                 ORTE_ERROR_LOG(rc);
                 goto cleanup;
             }
-            if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)&proc, 
OPAL_SCOPE_INTERNAL,
+            if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)proc, 
OPAL_SCOPE_INTERNAL,
                                                     ORTE_DB_NODERANK, 
&node_rank, ORTE_NODE_RANK))) {
                 ORTE_ERROR_LOG(rc);
                 goto cleanup;
             }
 #if OPAL_HAVE_HWLOC
-            if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)&proc, 
OPAL_SCOPE_INTERNAL,
+            if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)proc, 
OPAL_SCOPE_INTERNAL,
                                                     OPAL_DB_CPUSET, 
cpu_bitmap, OPAL_STRING))) {
                 ORTE_ERROR_LOG(rc);
                 goto cleanup;
@@ -1044,25 +1050,25 @@
              * for ourself in the database
              * as we already did so during startup
              */
-            if (proc.jobid != ORTE_PROC_MY_NAME->jobid ||
-                proc.vpid != ORTE_PROC_MY_NAME->vpid) {
+            if (proc->jobid != ORTE_PROC_MY_NAME->jobid ||
+                proc->vpid != ORTE_PROC_MY_NAME->vpid) {
                 /* store the data for this proc - the location of a proc is 
something
                  * we would potentially need to share with a non-peer
                  */
-                if (ORTE_SUCCESS != (rc = 
opal_db.store((opal_identifier_t*)&proc, OPAL_SCOPE_NON_PEER,
-                                                        ORTE_DB_DAEMON_VPID, 
&dmn.vpid, OPAL_UINT32))) {
+                if (ORTE_SUCCESS != (rc = 
opal_db.store((opal_identifier_t*)proc, OPAL_SCOPE_NON_PEER,
+                                                        ORTE_DB_DAEMON_VPID, 
&dmn->vpid, OPAL_UINT32))) {
                     ORTE_ERROR_LOG(rc);
                     goto cleanup;
                 }
                 /* in a singleton comm_spawn, we can be passed the name of a 
daemon, which
                  * means that the proc's parent is invalid - check and avoid 
the rest of
                  * this logic in that case */
-                if (ORTE_VPID_INVALID != dmn.vpid) {
+                if (ORTE_VPID_INVALID != dmn->vpid) {
                     /* if coprocessors were detected, lookup and store the 
hostid for this proc */
                     if (orte_coprocessors_detected) {
                         /* lookup the hostid for this daemon */
                         vptr = &hostid;
-                        if (ORTE_SUCCESS != (rc = 
opal_db.fetch((opal_identifier_t*)&dmn, ORTE_DB_HOSTID,
+                        if (ORTE_SUCCESS != (rc = 
opal_db.fetch((opal_identifier_t*)dmn, ORTE_DB_HOSTID,
                                                                 (void**)&vptr, 
OPAL_UINT32))) {
                             ORTE_ERROR_LOG(rc);
                             goto cleanup;
@@ -1070,29 +1076,29 @@
                         OPAL_OUTPUT_VERBOSE((2, orte_nidmap_output,
                                              "%s FOUND HOSTID %s FOR DAEMON 
%s",
                                              
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
-                                             ORTE_VPID_PRINT(hostid), 
ORTE_VPID_PRINT(dmn.vpid)));
+                                             ORTE_VPID_PRINT(hostid), 
ORTE_VPID_PRINT(dmn->vpid)));
                         /* store it as hostid for this proc */
-                        if (ORTE_SUCCESS != (rc = 
opal_db.store((opal_identifier_t*)&proc, OPAL_SCOPE_NON_PEER,
+                        if (ORTE_SUCCESS != (rc = 
opal_db.store((opal_identifier_t*)proc, OPAL_SCOPE_NON_PEER,
                                                                 
ORTE_DB_HOSTID, &hostid, OPAL_UINT32))) {
                             ORTE_ERROR_LOG(rc);
                             goto cleanup;
                         }
                     }
                     /* lookup and store the hostname for this proc */
-                    if (ORTE_SUCCESS != (rc = 
opal_db.fetch_pointer((opal_identifier_t*)&dmn, ORTE_DB_HOSTNAME,
+                    if (ORTE_SUCCESS != (rc = 
opal_db.fetch_pointer((opal_identifier_t*)dmn, ORTE_DB_HOSTNAME,
                                                                     
(void**)&hostname, OPAL_STRING))) {
                         ORTE_ERROR_LOG(rc);
                         goto cleanup;
                     }
-                    if (ORTE_SUCCESS != (rc = 
opal_db.store((opal_identifier_t*)&proc, OPAL_SCOPE_NON_PEER,
+                    if (ORTE_SUCCESS != (rc = 
opal_db.store((opal_identifier_t*)proc, OPAL_SCOPE_NON_PEER,
                                                             ORTE_DB_HOSTNAME, 
hostname, OPAL_STRING))) {
                         ORTE_ERROR_LOG(rc);
                         goto cleanup;
                     }
                 }
                 /* store this procs global rank - only used by us */
-                global_rank = proc.vpid + offset;
-                if (ORTE_SUCCESS != (rc = 
opal_db.store((opal_identifier_t*)&proc, OPAL_SCOPE_INTERNAL,
+                global_rank = proc->vpid + offset;
+                if (ORTE_SUCCESS != (rc = 
opal_db.store((opal_identifier_t*)proc, OPAL_SCOPE_INTERNAL,
                                                         ORTE_DB_GLOBAL_RANK, 
&global_rank, OPAL_UINT32))) {
                     ORTE_ERROR_LOG(rc);
                     goto cleanup;
@@ -1101,8 +1107,8 @@
                 /* update our own global rank - this is something we will need
                  * to share with non-peers
                  */
-                global_rank = proc.vpid + offset;
-                if (ORTE_SUCCESS != (rc = 
opal_db.store((opal_identifier_t*)&proc, OPAL_SCOPE_NON_PEER,
+                global_rank = proc->vpid + offset;
+                if (ORTE_SUCCESS != (rc = 
opal_db.store((opal_identifier_t*)proc, OPAL_SCOPE_NON_PEER,
                                                         ORTE_DB_GLOBAL_RANK, 
&global_rank, OPAL_UINT32))) {
                     ORTE_ERROR_LOG(rc);
                     goto cleanup;

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

Reply via email to