In every 'modex' block I use  coll->id = orte_process_info.peer_modex;  id
and in every 'barrier' block I use  coll->id = orte_process_info.peer_init_
barrier;  id.

P.s. In general (as I wrote in first letter), I use 'modex' term for
following code:
    coll = OBJ_NEW(orte_grpcomm_collective_t);
    coll->id = orte_process_info.peer_modex;
    if (ORTE_SUCCESS != (ret = orte_grpcomm.modex(coll))) {
        error = "orte_grpcomm_modex failed";
        goto error;
    }
    /* wait for modex to complete - this may be moved anywhere in mpi_init
     * so long as it occurs prior to calling a function that needs
     * the modex info!
     */
    while (coll->active) {
        opal_progress();  /* block in progress pending events */
    }
    OBJ_RELEASE(coll);

and 'barrier' for this:

    coll = OBJ_NEW(orte_grpcomm_collective_t);
    coll->id = orte_process_info.peer_init_barrier;
    if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier(coll))) {
        error = "orte_grpcomm_barrier failed";
        goto error;
    }
    /* wait for barrier to complete */
    while (coll->active) {
        opal_progress();  /* block in progress pending events */
    }
    OBJ_RELEASE(coll);

On Thu, Dec 20, 2012 at 8:57 PM, Ralph Castain <r...@open-mpi.org> wrote:

>
> On Dec 20, 2012, at 8:29 AM, Victor Kocheganov <
> victor.kochega...@itseez.com> wrote:
>
> Thanks for fast answer, Ralph.
>
> In my example I use different collective objects. I mean in every
> mentioned block I call  *coll = OBJ_NEW(orte_grpcomm_**collective_t);*
> and *OBJ_RELEASE(coll);* , so all the grpcomm operations use unique
> collective object.
>
>
> How are the procs getting the collective id for those new calls? They all
> have to match
>
>
>
> On Thu, Dec 20, 2012 at 7:48 PM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> Absolutely it will hang as the collective object passed into any grpcomm
>> operation (modex or barrier) is only allowed to be used once - any attempt
>> to reuse it will fail.
>>
>>
>> On Dec 20, 2012, at 6:57 AM, Victor Kocheganov <
>> victor.kochega...@itseez.com> wrote:
>>
>>   Hi.
>>
>> I have an issue with understanding  *ompi_mpi_init() *logic. Could you
>> please tell me if you have any guesses about following behavior.
>>
>> I wonder if I understand ringh, there is a block in *ompi_mpi_init() 
>> *function
>> for exchanging procs information between processes (denote this block
>> 'modex'):
>>
>>     coll = OBJ_NEW(orte_grpcomm_collective_t);
>>     coll->id = orte_process_info.peer_modex;
>>     if (ORTE_SUCCESS != (ret = orte_grpcomm.modex(coll))) {
>>         error = "orte_grpcomm_modex failed";
>>         goto error;
>>     }
>>     /* wait for modex to complete - this may be moved anywhere in mpi_init
>>      * so long as it occurs prior to calling a function that needs
>>      * the modex info!
>>      */
>>     while (coll->active) {
>>         opal_progress();  /* block in progress pending events */
>>     }
>>     OBJ_RELEASE(coll);
>>
>> and several instructions after this there is a block for processes
>> synchronization (denote this block 'barrier'):
>>
>>     coll = OBJ_NEW(orte_grpcomm_collective_t);
>>     coll->id = orte_process_info.peer_init_barrier;
>>     if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier(coll))) {
>>         error = "orte_grpcomm_barrier failed";
>>         goto error;
>>     }
>>     /* wait for barrier to complete */
>>     while (coll->active) {
>>         opal_progress();  /* block in progress pending events */
>>     }
>>     OBJ_RELEASE(coll);
>>
>> So,* *initially* **ompi_mpi_init()* has following structure:
>>
>> ...
>> 'modex' block;
>> ...
>> 'barrier' block;
>> ...
>>
>> I made several experiments with this code and the following one is of
>> interest: if I add sequence of two additional blocks, 'barrier' and
>> 'modex', right after 'modex' block, then* **ompi_mpi_init() *hangs in *
>> opal_progress()* of the last 'modex' block.
>>
>> ...
>> 'modex' block;
>> 'barrier' block;
>> 'modex' block; <- hangs
>> ...
>> 'barrier' block;
>> ...
>>
>> Thanks,
>> Victor Kocheganov.
>>  _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Reply via email to