On Tue, Sep 18, 2007 at 10:57:38AM -0400, George Bosilca wrote:
> More information about this can be founded in the trac #1127
> (https://svn.open-mpi.org/trac/ompi/ticket/1127).
>
OK. So the code I cited is only a temporary solution. Thanks.

>   george.
>
> On Sep 18, 2007, at 10:20 AM, Gleb Natapov wrote:
>
>> On Tue, Sep 18, 2007 at 09:44:42AM -0400, George Bosilca wrote:
>>> The setup of a communicators include as a last stage, a collective
>>> communication. As a result, some of the nodes can exit the collective
>>> before the others and therefore can start sending messages using this
>>> communicator [while some of the other nodes are still waiting for the
>>> collective completion]. This will lead to a situation where a node 
>>> receive
>>> a message for a communicator that they are building up.
>>>
>>> There is a bug filled in trac about this. In FT-MPI we temporary put 
>>> these
>>> messages in an internal queue, and deliver them to the right communicator
>>> only once this communicator is completely created.
>> In ompi_comm_nextcid() function there is this code for thread_multiple
>> case:
>>
>>  /* for synchronization purposes, avoids receiving fragments for
>>     a communicator id, which might not yet been known. For single-threaded
>>     scenarios, this call is in ompi_comm_activate, for multi-threaded
>>     scenarios, it has to be already here ( before releasing another
>>     thread into the cid-allocation loop ) */
>>  (allredfnct)(&response, &glresponse, 1, MPI_MIN, comm, bridgecomm,
>>                      local_leader, remote_leader, send_first );
>>
>> This collective is executed on old communicator after setup of a new
>> cid. Is this not enough to solve the problem? Some ranks may leave
>> this collective call earlier than others, but none can leave it before
>> all ranks enter it and at this stage new communicator is already exists
>> in all of them. Do I miss something?
>>
>>
>>>
>>>   george.
>>>
>>> On Sep 18, 2007, at 9:06 AM, Gleb Natapov wrote:
>>>
>>>> George,
>>>>
>>>>     In the comment you are saying that "a message for a not yet existing
>>>> communicator can happen". Can you explain in what situation it can
>>>> happen?
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>>                    Gleb.
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> --
>>                      Gleb.
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>



> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
                        Gleb.

Reply via email to