Thanks Ralph !

it did fix the problem

Cheers,

Gilles

On 2014/10/01 3:04, Ralph Castain wrote:
> I fixed this in r32818 - the components shouldn't be passing back success if 
> the requested info isn't found. Hope that fixes the problem.
>
>
> On Sep 30, 2014, at 1:54 AM, Gilles Gouaillardet 
> <gilles.gouaillar...@iferc.org> wrote:
>
>> Folks,
>>
>> the dynamic/spawn test from the ibm test suite crashes if the openib btl
>> is detected
>> (the test can be ran on one node with an IB port)
>>
>> here is what happens :
>>
>> in mca_btl_openib_proc_create,
>> the macro
>>    OPAL_MODEX_RECV(rc, &mca_btl_openib_component.super.btl_version,
>>                    proc, &message, &msg_size);
>> does not find any information *but*
>> rc is OPAL_SUCCESS
>> msg_size is not updated (e.g. left uninitialized)
>> message is not updated (e.g. left uninitialized)
>>
>> then, if msg_size is unitialized with a non zero value, and if message
>> is uninitialized with
>> a non valid address, a crash will occur when accessing message.
>>
>> /* i am not debating here the fact that there is no information returned,
>> i am simply discussing the crash */
>>
>> a simple workaround is to initialize msg_size to zero.
>>
>> that being said, is this the correct fix ?
>>
>> one possible alternate fix is to update the OPAL_MODEX_RECV_STRING macro
>> like this :
>>
>> /* from opal/mca/pmix/pmix.h */
>> #define OPAL_MODEX_RECV_STRING(r, s, p, d, sz)                          \
>>    do {                                                                \
>>        opal_value_t *kv;                                               \
>>        if (OPAL_SUCCESS == ((r) = opal_pmix.get(&(p)->proc_name,       \
>>                                                 (s), &kv))) {          \
>>            if (NULL != kv)
>> {                                               \
>>                *(d) =
>> kv->data.bo.bytes;                                   \
>>                *(sz) =
>> kv->data.bo.size;                                   \
>>                kv->data.bo.bytes = NULL; /* protect the data
>> */            \
>>
>> OBJ_RELEASE(kv);                                            \
>>            } else {                \
>>                *(sz) = 0;                    \
>>                (r) = OPAL_ERR_NOT_FOUND;
>>            }                     \
>>        }                                                               \
>>    } while(0);
>>
>> /*
>> *(sz) = 0; and (r) = OPAL_ERR_NOT_FOUND; can be seen as redundant, *(sz)
>> *or* (r) could be set
>> */
>>
>> and an other alternate fix is to update the end of the native_get
>> function like this :
>>
>> /* from opal/mca/pmix/native/pmix_native.c */
>>
>>    if (found) {
>>        return OPAL_SUCCESS;
>>    }
>>    *kv = NULL;
>>    if (OPAL_SUCCESS == rc) {
>>        if (OPAL_SUCCESS == ret) {
>>            rc = OPAL_ERR_NOT_FOUND;
>>        } else {
>>            rc = ret;
>>        }
>>    }
>>    return rc;
>>
>> Could you please advise ?
>>
>> Cheers,
>>
>> Gilles
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15942.php
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15950.php

Reply via email to