Thanks for help. All work as you said. On Fri, Dec 21, 2012 at 7:11 PM, Ralph Castain <r...@open-mpi.org> wrote:
> Don't know how many times I can repeat it, but I'll try again: you are not > allowed to reuse a collective id. If it happens to work, it's by accident. > > If you want to implement multiple modex/barrier operations, they each need > to have their own unique collective id. > > > On Dec 20, 2012, at 9:28 PM, Victor Kocheganov < > victor.kochega...@itseez.com> wrote: > > Actually, if I reuse id's in equivalent calls like this: > > ... > 'modex' block; > 'modex' block; > 'modex' block; > ... > > or > > ... > 'barrier' block; > 'barrier' block; > 'barrier' block; > ... > > there is no hanging. The hang only occurs if this "reusing" follows after > using of another collective id, In the way I wrote in the first letter: > > ... > 'modex' block; > 'barrier' block; > 'modex' block; <- hangs > ... > > or in this way > > ... > 'barrier' block; > 'modex' block; > 'barrier' block; <- hangs > ... > > > If I use different collective id while calling modex (1, 2 , ... , but not > 0==orte_process_info.peer_modex), that also won't work, unfortunately.. > > > > On Thu, Dec 20, 2012 at 10:39 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> Yeah, that won't work. The id's cannot be reused, so you'd have to assign >> a different one in each case. >> >> On Dec 20, 2012, at 9:12 AM, Victor Kocheganov < >> victor.kochega...@itseez.com> wrote: >> >> In every 'modex' block I use coll->id = orte_process_info.peer_modex; >> id and in every 'barrier' block I use coll->id = >> orte_process_info.peer_init_barrier; id. >> >> P.s. In general (as I wrote in first letter), I use 'modex' term for >> following code: >> coll = OBJ_NEW(orte_grpcomm_collective_t); >> coll->id = orte_process_info.peer_modex; >> if (ORTE_SUCCESS != (ret = orte_grpcomm.modex(coll))) { >> error = "orte_grpcomm_modex failed"; >> goto error; >> } >> /* wait for modex to complete - this may be moved anywhere in mpi_init >> * so long as it occurs prior to calling a function that needs >> * the modex info! >> */ >> while (coll->active) { >> opal_progress(); /* block in progress pending events */ >> } >> OBJ_RELEASE(coll); >> >> and 'barrier' for this: >> >> coll = OBJ_NEW(orte_grpcomm_collective_t); >> coll->id = orte_process_info.peer_init_barrier; >> if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier(coll))) { >> error = "orte_grpcomm_barrier failed"; >> goto error; >> } >> /* wait for barrier to complete */ >> while (coll->active) { >> opal_progress(); /* block in progress pending events */ >> } >> OBJ_RELEASE(coll); >> >> On Thu, Dec 20, 2012 at 8:57 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> >>> On Dec 20, 2012, at 8:29 AM, Victor Kocheganov < >>> victor.kochega...@itseez.com> wrote: >>> >>> Thanks for fast answer, Ralph. >>> >>> In my example I use different collective objects. I mean in every >>> mentioned block I call *coll = OBJ_NEW(orte_grpcomm_**collective_t);* >>> and *OBJ_RELEASE(coll);* , so all the grpcomm operations use unique >>> collective object. >>> >>> >>> How are the procs getting the collective id for those new calls? They >>> all have to match >>> >>> >>> >>> On Thu, Dec 20, 2012 at 7:48 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>>> Absolutely it will hang as the collective object passed into any >>>> grpcomm operation (modex or barrier) is only allowed to be used once - any >>>> attempt to reuse it will fail. >>>> >>>> >>>> On Dec 20, 2012, at 6:57 AM, Victor Kocheganov < >>>> victor.kochega...@itseez.com> wrote: >>>> >>>> Hi. >>>> >>>> I have an issue with understanding *ompi_mpi_init() *logic. Could you >>>> please tell me if you have any guesses about following behavior. >>>> >>>> I wonder if I understand ringh, there is a block in *ompi_mpi_init() >>>> *function >>>> for exchanging procs information between processes (denote this block >>>> 'modex'): >>>> >>>> coll = OBJ_NEW(orte_grpcomm_collective_t); >>>> coll->id = orte_process_info.peer_modex; >>>> if (ORTE_SUCCESS != (ret = orte_grpcomm.modex(coll))) { >>>> error = "orte_grpcomm_modex failed"; >>>> goto error; >>>> } >>>> /* wait for modex to complete - this may be moved anywhere in >>>> mpi_init >>>> * so long as it occurs prior to calling a function that needs >>>> * the modex info! >>>> */ >>>> while (coll->active) { >>>> opal_progress(); /* block in progress pending events */ >>>> } >>>> OBJ_RELEASE(coll); >>>> >>>> and several instructions after this there is a block for processes >>>> synchronization (denote this block 'barrier'): >>>> >>>> coll = OBJ_NEW(orte_grpcomm_collective_t); >>>> coll->id = orte_process_info.peer_init_barrier; >>>> if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier(coll))) { >>>> error = "orte_grpcomm_barrier failed"; >>>> goto error; >>>> } >>>> /* wait for barrier to complete */ >>>> while (coll->active) { >>>> opal_progress(); /* block in progress pending events */ >>>> } >>>> OBJ_RELEASE(coll); >>>> >>>> So,* *initially* **ompi_mpi_init()* has following structure: >>>> >>>> ... >>>> 'modex' block; >>>> ... >>>> 'barrier' block; >>>> ... >>>> >>>> I made several experiments with this code and the following one is of >>>> interest: if I add sequence of two additional blocks, 'barrier' and >>>> 'modex', right after 'modex' block, then* **ompi_mpi_init() *hangs in * >>>> opal_progress()* of the last 'modex' block. >>>> >>>> ... >>>> 'modex' block; >>>> 'barrier' block; >>>> 'modex' block; <- hangs >>>> ... >>>> 'barrier' block; >>>> ... >>>> >>>> Thanks, >>>> Victor Kocheganov. >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >