Thanks for help. All work as you said.

On Fri, Dec 21, 2012 at 7:11 PM, Ralph Castain <r...@open-mpi.org> wrote:

> Don't know how many times I can repeat it, but I'll try again: you are not
> allowed to reuse a collective id. If it happens to work, it's by accident.
>
> If you want to implement multiple modex/barrier operations, they each need
> to have their own unique collective id.
>
>
> On Dec 20, 2012, at 9:28 PM, Victor Kocheganov <
> victor.kochega...@itseez.com> wrote:
>
> Actually, if I reuse id's in equivalent calls like this:
>
> ...
> 'modex' block;
> 'modex' block;
> 'modex' block;
> ...
>
> or
>
> ...
> 'barrier' block;
> 'barrier' block;
> 'barrier' block;
> ...
>
> there is no hanging. The hang only occurs if this "reusing" follows after
> using of another collective id, In the way I wrote in the first letter:
>
> ...
> 'modex' block;
> 'barrier' block;
> 'modex' block; <- hangs
> ...
>
> or in this way
>
> ...
> 'barrier' block;
> 'modex' block;
> 'barrier' block; <- hangs
> ...
>
>
> If I use different collective id while calling modex (1, 2 , ... , but not
>  0==orte_process_info.peer_modex), that also won't work, unfortunately..
>
>
>
> On Thu, Dec 20, 2012 at 10:39 PM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> Yeah, that won't work. The id's cannot be reused, so you'd have to assign
>> a different one in each case.
>>
>> On Dec 20, 2012, at 9:12 AM, Victor Kocheganov <
>> victor.kochega...@itseez.com> wrote:
>>
>> In every 'modex' block I use  coll->id = orte_process_info.peer_modex;
>> id and in every 'barrier' block I use  coll->id =
>> orte_process_info.peer_init_barrier;  id.
>>
>> P.s. In general (as I wrote in first letter), I use 'modex' term for
>> following code:
>>     coll = OBJ_NEW(orte_grpcomm_collective_t);
>>     coll->id = orte_process_info.peer_modex;
>>     if (ORTE_SUCCESS != (ret = orte_grpcomm.modex(coll))) {
>>         error = "orte_grpcomm_modex failed";
>>         goto error;
>>     }
>>     /* wait for modex to complete - this may be moved anywhere in mpi_init
>>      * so long as it occurs prior to calling a function that needs
>>      * the modex info!
>>      */
>>     while (coll->active) {
>>         opal_progress();  /* block in progress pending events */
>>     }
>>     OBJ_RELEASE(coll);
>>
>> and 'barrier' for this:
>>
>>     coll = OBJ_NEW(orte_grpcomm_collective_t);
>>     coll->id = orte_process_info.peer_init_barrier;
>>     if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier(coll))) {
>>         error = "orte_grpcomm_barrier failed";
>>         goto error;
>>     }
>>     /* wait for barrier to complete */
>>     while (coll->active) {
>>         opal_progress();  /* block in progress pending events */
>>     }
>>     OBJ_RELEASE(coll);
>>
>> On Thu, Dec 20, 2012 at 8:57 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>
>>>
>>> On Dec 20, 2012, at 8:29 AM, Victor Kocheganov <
>>> victor.kochega...@itseez.com> wrote:
>>>
>>> Thanks for fast answer, Ralph.
>>>
>>> In my example I use different collective objects. I mean in every
>>> mentioned block I call  *coll = OBJ_NEW(orte_grpcomm_**collective_t);*
>>> and *OBJ_RELEASE(coll);* , so all the grpcomm operations use unique
>>> collective object.
>>>
>>>
>>> How are the procs getting the collective id for those new calls? They
>>> all have to match
>>>
>>>
>>>
>>> On Thu, Dec 20, 2012 at 7:48 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>
>>>> Absolutely it will hang as the collective object passed into any
>>>> grpcomm operation (modex or barrier) is only allowed to be used once - any
>>>> attempt to reuse it will fail.
>>>>
>>>>
>>>> On Dec 20, 2012, at 6:57 AM, Victor Kocheganov <
>>>> victor.kochega...@itseez.com> wrote:
>>>>
>>>>   Hi.
>>>>
>>>> I have an issue with understanding  *ompi_mpi_init() *logic. Could you
>>>> please tell me if you have any guesses about following behavior.
>>>>
>>>> I wonder if I understand ringh, there is a block in *ompi_mpi_init() 
>>>> *function
>>>> for exchanging procs information between processes (denote this block
>>>> 'modex'):
>>>>
>>>>     coll = OBJ_NEW(orte_grpcomm_collective_t);
>>>>     coll->id = orte_process_info.peer_modex;
>>>>     if (ORTE_SUCCESS != (ret = orte_grpcomm.modex(coll))) {
>>>>         error = "orte_grpcomm_modex failed";
>>>>         goto error;
>>>>     }
>>>>     /* wait for modex to complete - this may be moved anywhere in
>>>> mpi_init
>>>>      * so long as it occurs prior to calling a function that needs
>>>>      * the modex info!
>>>>      */
>>>>     while (coll->active) {
>>>>         opal_progress();  /* block in progress pending events */
>>>>     }
>>>>     OBJ_RELEASE(coll);
>>>>
>>>> and several instructions after this there is a block for processes
>>>> synchronization (denote this block 'barrier'):
>>>>
>>>>     coll = OBJ_NEW(orte_grpcomm_collective_t);
>>>>     coll->id = orte_process_info.peer_init_barrier;
>>>>     if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier(coll))) {
>>>>         error = "orte_grpcomm_barrier failed";
>>>>         goto error;
>>>>     }
>>>>     /* wait for barrier to complete */
>>>>     while (coll->active) {
>>>>         opal_progress();  /* block in progress pending events */
>>>>     }
>>>>     OBJ_RELEASE(coll);
>>>>
>>>> So,* *initially* **ompi_mpi_init()* has following structure:
>>>>
>>>> ...
>>>> 'modex' block;
>>>> ...
>>>> 'barrier' block;
>>>> ...
>>>>
>>>> I made several experiments with this code and the following one is of
>>>> interest: if I add sequence of two additional blocks, 'barrier' and
>>>> 'modex', right after 'modex' block, then* **ompi_mpi_init() *hangs in *
>>>> opal_progress()* of the last 'modex' block.
>>>>
>>>> ...
>>>> 'modex' block;
>>>> 'barrier' block;
>>>> 'modex' block; <- hangs
>>>> ...
>>>> 'barrier' block;
>>>> ...
>>>>
>>>> Thanks,
>>>> Victor Kocheganov.
>>>>  _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Reply via email to