Thanks Ralph ! it did fix the problem
Cheers, Gilles On 2014/10/01 3:04, Ralph Castain wrote: > I fixed this in r32818 - the components shouldn't be passing back success if > the requested info isn't found. Hope that fixes the problem. > > > On Sep 30, 2014, at 1:54 AM, Gilles Gouaillardet > <gilles.gouaillar...@iferc.org> wrote: > >> Folks, >> >> the dynamic/spawn test from the ibm test suite crashes if the openib btl >> is detected >> (the test can be ran on one node with an IB port) >> >> here is what happens : >> >> in mca_btl_openib_proc_create, >> the macro >> OPAL_MODEX_RECV(rc, &mca_btl_openib_component.super.btl_version, >> proc, &message, &msg_size); >> does not find any information *but* >> rc is OPAL_SUCCESS >> msg_size is not updated (e.g. left uninitialized) >> message is not updated (e.g. left uninitialized) >> >> then, if msg_size is unitialized with a non zero value, and if message >> is uninitialized with >> a non valid address, a crash will occur when accessing message. >> >> /* i am not debating here the fact that there is no information returned, >> i am simply discussing the crash */ >> >> a simple workaround is to initialize msg_size to zero. >> >> that being said, is this the correct fix ? >> >> one possible alternate fix is to update the OPAL_MODEX_RECV_STRING macro >> like this : >> >> /* from opal/mca/pmix/pmix.h */ >> #define OPAL_MODEX_RECV_STRING(r, s, p, d, sz) \ >> do { \ >> opal_value_t *kv; \ >> if (OPAL_SUCCESS == ((r) = opal_pmix.get(&(p)->proc_name, \ >> (s), &kv))) { \ >> if (NULL != kv) >> { \ >> *(d) = >> kv->data.bo.bytes; \ >> *(sz) = >> kv->data.bo.size; \ >> kv->data.bo.bytes = NULL; /* protect the data >> */ \ >> >> OBJ_RELEASE(kv); \ >> } else { \ >> *(sz) = 0; \ >> (r) = OPAL_ERR_NOT_FOUND; >> } \ >> } \ >> } while(0); >> >> /* >> *(sz) = 0; and (r) = OPAL_ERR_NOT_FOUND; can be seen as redundant, *(sz) >> *or* (r) could be set >> */ >> >> and an other alternate fix is to update the end of the native_get >> function like this : >> >> /* from opal/mca/pmix/native/pmix_native.c */ >> >> if (found) { >> return OPAL_SUCCESS; >> } >> *kv = NULL; >> if (OPAL_SUCCESS == rc) { >> if (OPAL_SUCCESS == ret) { >> rc = OPAL_ERR_NOT_FOUND; >> } else { >> rc = ret; >> } >> } >> return rc; >> >> Could you please advise ? >> >> Cheers, >> >> Gilles >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/09/15942.php > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15950.php