Thanks for fast answer, Ralph. In my example I use different collective objects. I mean in every mentioned block I call *coll = OBJ_NEW(orte_grpcomm_**collective_t);* and *OBJ_RELEASE(coll);* , so all the grpcomm operations use unique collective object.
On Thu, Dec 20, 2012 at 7:48 PM, Ralph Castain <r...@open-mpi.org> wrote: > Absolutely it will hang as the collective object passed into any grpcomm > operation (modex or barrier) is only allowed to be used once - any attempt > to reuse it will fail. > > > On Dec 20, 2012, at 6:57 AM, Victor Kocheganov < > victor.kochega...@itseez.com> wrote: > > Hi. > > I have an issue with understanding *ompi_mpi_init() *logic. Could you > please tell me if you have any guesses about following behavior. > > I wonder if I understand ringh, there is a block in *ompi_mpi_init() *function > for exchanging procs information between processes (denote this block > 'modex'): > > coll = OBJ_NEW(orte_grpcomm_collective_t); > coll->id = orte_process_info.peer_modex; > if (ORTE_SUCCESS != (ret = orte_grpcomm.modex(coll))) { > error = "orte_grpcomm_modex failed"; > goto error; > } > /* wait for modex to complete - this may be moved anywhere in mpi_init > * so long as it occurs prior to calling a function that needs > * the modex info! > */ > while (coll->active) { > opal_progress(); /* block in progress pending events */ > } > OBJ_RELEASE(coll); > > and several instructions after this there is a block for processes > synchronization (denote this block 'barrier'): > > coll = OBJ_NEW(orte_grpcomm_collective_t); > coll->id = orte_process_info.peer_init_barrier; > if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier(coll))) { > error = "orte_grpcomm_barrier failed"; > goto error; > } > /* wait for barrier to complete */ > while (coll->active) { > opal_progress(); /* block in progress pending events */ > } > OBJ_RELEASE(coll); > > So,* *initially* **ompi_mpi_init()* has following structure: > > ... > 'modex' block; > ... > 'barrier' block; > ... > > I made several experiments with this code and the following one is of > interest: if I add sequence of two additional blocks, 'barrier' and > 'modex', right after 'modex' block, then* **ompi_mpi_init() *hangs in * > opal_progress()* of the last 'modex' block. > > ... > 'modex' block; > 'barrier' block; > 'modex' block; <- hangs > ... > 'barrier' block; > ... > > Thanks, > Victor Kocheganov. > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >