This usually happens when a string that belongs to the MCA system is freed
elsewhere. Can you find out the name of the variable that is being destructed
in frame 2.

-Nathan Hjelm
Application Readiness, HPC-5, LANL

On Tue, Dec 03, 2013 at 02:53:29PM +0200, Mike Dubman wrote:
>    Hi,
>    We observe crash during shmem_finalize()  (in trunk) with new MCA
>    framework.
>    After investigation, found that  MCA tears-down process can access
>    previously released memory. (reproduced with oshmem_hello_c.c test)
>    0 0x00007fffed3d51d0 in ?? ()
>    #1 <signal handler called>
>    #2 0x00007ffff710e21e in var_destructor (var=0x6fa7e0) at
>    mca_base_var.c:1605
>    #3 0x00007ffff710ae99 in opal_obj_run_destructors (object=0x6fa7e0) at
>    ../../../opal/class/opal_object.h:448
>    #4 0x00007ffff710ca18 in mca_base_var_finalize () at mca_base_var.c:954
>    #5 0x00007ffff710a7e2 in mca_base_param_finalize () at
>    mca_base_param.c:643
>    #6 0x00007ffff70e08e2 in opal_finalize_util () at
>    runtime/opal_finalize.c:77
>    #7 0x00007ffff7aa5319 in ompi_mpi_finalize () at
>    runtime/ompi_mpi_finalize.c:407
>    #8 0x00007ffff7d900cc in oshmem_shmem_finalize () at
>    runtime/oshmem_shmem_finalize.c:75
>    #9 0x00007ffff7d91119 in shmem_finalize () at shmem_finalize.c:24
>    #10 0x00007ffff7d89b8f in __do_global_dtors_aux () from
>    /install/lib/libshmem.so.0
>    #11 0x0000000000000000 in ?? ()
>    The crash can be resolved by following patch:
>    diff --git a/opal/mca/base/mca_base_var.c b/opal/mca/base/mca_base_var.c
>    index 9966627..48028d8 100644
>    --- a/opal/mca/base/mca_base_var.c
>    +++ b/opal/mca/base/mca_base_var.c
>    @@ -773,7 +773,7 @@ static int var_find_by_name (const char *full_name,
>    int *index, bool invalidok)
>     
>         (void) var_get ((int)(uintptr_t) tmp, &var, false);
>     
>    -    if (invalidok || VAR_IS_VALID(var[0])) {
>    +    if (VAR_IS_VALID(var[0])) {
>             *index = (int)(uintptr_t) tmp;
>             return OPAL_SUCCESS;
>         }
>    I`m not sure we understand yet why it fixes the problem and what is a
>    race.
>    Could some` with knowledge of MCA flows look at it and comment?
>    The "invalidok" was introduced by Jeff`s commit.
>    Thanks
>    M

> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Attachment: pgppRTFozVjgF.pgp
Description: PGP signature

Reply via email to