I fixed this in r32818 - the components shouldn't be passing back success if 
the requested info isn't found. Hope that fixes the problem.


On Sep 30, 2014, at 1:54 AM, Gilles Gouaillardet 
<gilles.gouaillar...@iferc.org> wrote:

> Folks,
> 
> the dynamic/spawn test from the ibm test suite crashes if the openib btl
> is detected
> (the test can be ran on one node with an IB port)
> 
> here is what happens :
> 
> in mca_btl_openib_proc_create,
> the macro
>    OPAL_MODEX_RECV(rc, &mca_btl_openib_component.super.btl_version,
>                    proc, &message, &msg_size);
> does not find any information *but*
> rc is OPAL_SUCCESS
> msg_size is not updated (e.g. left uninitialized)
> message is not updated (e.g. left uninitialized)
> 
> then, if msg_size is unitialized with a non zero value, and if message
> is uninitialized with
> a non valid address, a crash will occur when accessing message.
> 
> /* i am not debating here the fact that there is no information returned,
> i am simply discussing the crash */
> 
> a simple workaround is to initialize msg_size to zero.
> 
> that being said, is this the correct fix ?
> 
> one possible alternate fix is to update the OPAL_MODEX_RECV_STRING macro
> like this :
> 
> /* from opal/mca/pmix/pmix.h */
> #define OPAL_MODEX_RECV_STRING(r, s, p, d, sz)                          \
>    do {                                                                \
>        opal_value_t *kv;                                               \
>        if (OPAL_SUCCESS == ((r) = opal_pmix.get(&(p)->proc_name,       \
>                                                 (s), &kv))) {          \
>            if (NULL != kv)
> {                                               \
>                *(d) =
> kv->data.bo.bytes;                                   \
>                *(sz) =
> kv->data.bo.size;                                   \
>                kv->data.bo.bytes = NULL; /* protect the data
> */            \
> 
> OBJ_RELEASE(kv);                                            \
>            } else {                \
>                *(sz) = 0;                    \
>                (r) = OPAL_ERR_NOT_FOUND;
>            }                     \
>        }                                                               \
>    } while(0);
> 
> /*
> *(sz) = 0; and (r) = OPAL_ERR_NOT_FOUND; can be seen as redundant, *(sz)
> *or* (r) could be set
> */
> 
> and an other alternate fix is to update the end of the native_get
> function like this :
> 
> /* from opal/mca/pmix/native/pmix_native.c */
> 
>    if (found) {
>        return OPAL_SUCCESS;
>    }
>    *kv = NULL;
>    if (OPAL_SUCCESS == rc) {
>        if (OPAL_SUCCESS == ret) {
>            rc = OPAL_ERR_NOT_FOUND;
>        } else {
>            rc = ret;
>        }
>    }
>    return rc;
> 
> Could you please advise ?
> 
> Cheers,
> 
> Gilles
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15942.php

Reply via email to