Thanks for fast answer, Ralph.

In my example I use different collective objects. I mean in every mentioned
block I call  *coll = OBJ_NEW(orte_grpcomm_**collective_t);*
and *OBJ_RELEASE(coll);* , so all the grpcomm operations use unique
collective object.


On Thu, Dec 20, 2012 at 7:48 PM, Ralph Castain <r...@open-mpi.org> wrote:

> Absolutely it will hang as the collective object passed into any grpcomm
> operation (modex or barrier) is only allowed to be used once - any attempt
> to reuse it will fail.
>
>
> On Dec 20, 2012, at 6:57 AM, Victor Kocheganov <
> victor.kochega...@itseez.com> wrote:
>
>  Hi.
>
> I have an issue with understanding  *ompi_mpi_init() *logic. Could you
> please tell me if you have any guesses about following behavior.
>
> I wonder if I understand ringh, there is a block in *ompi_mpi_init() *function
> for exchanging procs information between processes (denote this block
> 'modex'):
>
>     coll = OBJ_NEW(orte_grpcomm_collective_t);
>     coll->id = orte_process_info.peer_modex;
>     if (ORTE_SUCCESS != (ret = orte_grpcomm.modex(coll))) {
>         error = "orte_grpcomm_modex failed";
>         goto error;
>     }
>     /* wait for modex to complete - this may be moved anywhere in mpi_init
>      * so long as it occurs prior to calling a function that needs
>      * the modex info!
>      */
>     while (coll->active) {
>         opal_progress();  /* block in progress pending events */
>     }
>     OBJ_RELEASE(coll);
>
> and several instructions after this there is a block for processes
> synchronization (denote this block 'barrier'):
>
>     coll = OBJ_NEW(orte_grpcomm_collective_t);
>     coll->id = orte_process_info.peer_init_barrier;
>     if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier(coll))) {
>         error = "orte_grpcomm_barrier failed";
>         goto error;
>     }
>     /* wait for barrier to complete */
>     while (coll->active) {
>         opal_progress();  /* block in progress pending events */
>     }
>     OBJ_RELEASE(coll);
>
> So,* *initially* **ompi_mpi_init()* has following structure:
>
> ...
> 'modex' block;
> ...
> 'barrier' block;
> ...
>
> I made several experiments with this code and the following one is of
> interest: if I add sequence of two additional blocks, 'barrier' and
> 'modex', right after 'modex' block, then* **ompi_mpi_init() *hangs in *
> opal_progress()* of the last 'modex' block.
>
> ...
> 'modex' block;
> 'barrier' block;
> 'modex' block; <- hangs
> ...
> 'barrier' block;
> ...
>
> Thanks,
> Victor Kocheganov.
>  _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Reply via email to