Actually, if I reuse id's in equivalent calls like this: ... 'modex' block; 'modex' block; 'modex' block; ...
or ... 'barrier' block; 'barrier' block; 'barrier' block; ... there is no hanging. The hang only occurs if this "reusing" follows after using of another collective id, In the way I wrote in the first letter: ... 'modex' block; 'barrier' block; 'modex' block; <- hangs ... or in this way ... 'barrier' block; 'modex' block; 'barrier' block; <- hangs ... If I use different collective id while calling modex (1, 2 , ... , but not 0==orte_process_info.peer_modex), that also won't work, unfortunately.. On Thu, Dec 20, 2012 at 10:39 PM, Ralph Castain <r...@open-mpi.org> wrote: > Yeah, that won't work. The id's cannot be reused, so you'd have to assign > a different one in each case. > > On Dec 20, 2012, at 9:12 AM, Victor Kocheganov < > victor.kochega...@itseez.com> wrote: > > In every 'modex' block I use coll->id = orte_process_info.peer_modex; > id and in every 'barrier' block I use coll->id = > orte_process_info.peer_init_barrier; id. > > P.s. In general (as I wrote in first letter), I use 'modex' term for > following code: > coll = OBJ_NEW(orte_grpcomm_collective_t); > coll->id = orte_process_info.peer_modex; > if (ORTE_SUCCESS != (ret = orte_grpcomm.modex(coll))) { > error = "orte_grpcomm_modex failed"; > goto error; > } > /* wait for modex to complete - this may be moved anywhere in mpi_init > * so long as it occurs prior to calling a function that needs > * the modex info! > */ > while (coll->active) { > opal_progress(); /* block in progress pending events */ > } > OBJ_RELEASE(coll); > > and 'barrier' for this: > > coll = OBJ_NEW(orte_grpcomm_collective_t); > coll->id = orte_process_info.peer_init_barrier; > if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier(coll))) { > error = "orte_grpcomm_barrier failed"; > goto error; > } > /* wait for barrier to complete */ > while (coll->active) { > opal_progress(); /* block in progress pending events */ > } > OBJ_RELEASE(coll); > > On Thu, Dec 20, 2012 at 8:57 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> >> On Dec 20, 2012, at 8:29 AM, Victor Kocheganov < >> victor.kochega...@itseez.com> wrote: >> >> Thanks for fast answer, Ralph. >> >> In my example I use different collective objects. I mean in every >> mentioned block I call *coll = OBJ_NEW(orte_grpcomm_**collective_t);* >> and *OBJ_RELEASE(coll);* , so all the grpcomm operations use unique >> collective object. >> >> >> How are the procs getting the collective id for those new calls? They all >> have to match >> >> >> >> On Thu, Dec 20, 2012 at 7:48 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> Absolutely it will hang as the collective object passed into any grpcomm >>> operation (modex or barrier) is only allowed to be used once - any attempt >>> to reuse it will fail. >>> >>> >>> On Dec 20, 2012, at 6:57 AM, Victor Kocheganov < >>> victor.kochega...@itseez.com> wrote: >>> >>> Hi. >>> >>> I have an issue with understanding *ompi_mpi_init() *logic. Could you >>> please tell me if you have any guesses about following behavior. >>> >>> I wonder if I understand ringh, there is a block in *ompi_mpi_init() >>> *function >>> for exchanging procs information between processes (denote this block >>> 'modex'): >>> >>> coll = OBJ_NEW(orte_grpcomm_collective_t); >>> coll->id = orte_process_info.peer_modex; >>> if (ORTE_SUCCESS != (ret = orte_grpcomm.modex(coll))) { >>> error = "orte_grpcomm_modex failed"; >>> goto error; >>> } >>> /* wait for modex to complete - this may be moved anywhere in >>> mpi_init >>> * so long as it occurs prior to calling a function that needs >>> * the modex info! >>> */ >>> while (coll->active) { >>> opal_progress(); /* block in progress pending events */ >>> } >>> OBJ_RELEASE(coll); >>> >>> and several instructions after this there is a block for processes >>> synchronization (denote this block 'barrier'): >>> >>> coll = OBJ_NEW(orte_grpcomm_collective_t); >>> coll->id = orte_process_info.peer_init_barrier; >>> if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier(coll))) { >>> error = "orte_grpcomm_barrier failed"; >>> goto error; >>> } >>> /* wait for barrier to complete */ >>> while (coll->active) { >>> opal_progress(); /* block in progress pending events */ >>> } >>> OBJ_RELEASE(coll); >>> >>> So,* *initially* **ompi_mpi_init()* has following structure: >>> >>> ... >>> 'modex' block; >>> ... >>> 'barrier' block; >>> ... >>> >>> I made several experiments with this code and the following one is of >>> interest: if I add sequence of two additional blocks, 'barrier' and >>> 'modex', right after 'modex' block, then* **ompi_mpi_init() *hangs in * >>> opal_progress()* of the last 'modex' block. >>> >>> ... >>> 'modex' block; >>> 'barrier' block; >>> 'modex' block; <- hangs >>> ... >>> 'barrier' block; >>> ... >>> >>> Thanks, >>> Victor Kocheganov. >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >