I fixed this in r32818 - the components shouldn't be passing back success if the requested info isn't found. Hope that fixes the problem.
On Sep 30, 2014, at 1:54 AM, Gilles Gouaillardet <gilles.gouaillar...@iferc.org> wrote: > Folks, > > the dynamic/spawn test from the ibm test suite crashes if the openib btl > is detected > (the test can be ran on one node with an IB port) > > here is what happens : > > in mca_btl_openib_proc_create, > the macro > OPAL_MODEX_RECV(rc, &mca_btl_openib_component.super.btl_version, > proc, &message, &msg_size); > does not find any information *but* > rc is OPAL_SUCCESS > msg_size is not updated (e.g. left uninitialized) > message is not updated (e.g. left uninitialized) > > then, if msg_size is unitialized with a non zero value, and if message > is uninitialized with > a non valid address, a crash will occur when accessing message. > > /* i am not debating here the fact that there is no information returned, > i am simply discussing the crash */ > > a simple workaround is to initialize msg_size to zero. > > that being said, is this the correct fix ? > > one possible alternate fix is to update the OPAL_MODEX_RECV_STRING macro > like this : > > /* from opal/mca/pmix/pmix.h */ > #define OPAL_MODEX_RECV_STRING(r, s, p, d, sz) \ > do { \ > opal_value_t *kv; \ > if (OPAL_SUCCESS == ((r) = opal_pmix.get(&(p)->proc_name, \ > (s), &kv))) { \ > if (NULL != kv) > { \ > *(d) = > kv->data.bo.bytes; \ > *(sz) = > kv->data.bo.size; \ > kv->data.bo.bytes = NULL; /* protect the data > */ \ > > OBJ_RELEASE(kv); \ > } else { \ > *(sz) = 0; \ > (r) = OPAL_ERR_NOT_FOUND; > } \ > } \ > } while(0); > > /* > *(sz) = 0; and (r) = OPAL_ERR_NOT_FOUND; can be seen as redundant, *(sz) > *or* (r) could be set > */ > > and an other alternate fix is to update the end of the native_get > function like this : > > /* from opal/mca/pmix/native/pmix_native.c */ > > if (found) { > return OPAL_SUCCESS; > } > *kv = NULL; > if (OPAL_SUCCESS == rc) { > if (OPAL_SUCCESS == ret) { > rc = OPAL_ERR_NOT_FOUND; > } else { > rc = ret; > } > } > return rc; > > Could you please advise ? > > Cheers, > > Gilles > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15942.php